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INTRODUCTION 

The overall reference for most of these lectures is my book [1], Time's Arrows and Quantum Mea- 
surement. Short excerpts from [1] appearing in these notes are used with permission of Cambridge 
University Press. 

The focus of this lecture series is time asymmetry, the "arrows of time." First we'll talk 
about the thermodynamic arrow: how things can behave irreversibly, despite time symmetry 
in the underlying dynamics. A good deal of effort will spent developing models and tools 
for this analysis. 

Then we'll examine another arrow, cosmological expansion, and discuss the hypothesis 
that the cosmological and thermodynamic arrows are related. This will introduce a more 
time-symmetric way of looking at things, and we'll talk about "two-time boundary condi- 
tions." Having related thermodynamics and cosmology, there arises the possibility that there 
are thermodynamics-based approaches to deducing the long-term future of our universe. 

Then a digression on the definition of entropy. If the second law (of thermodynamics) is 
fundamental, how can it depend on subjective definitions? The subjectivity I refer to arises 
in the distinction between "work" and "heat," or between macroscopic and microscopic. 
Here I'll use methods of non-equilibrium statistical mechanics, and (perhaps) surprisingly 
this will turn out to have implications for a practical contemporary issue: the identification 
of communities in a network. 

Finally, we'll take up quantum measurement theory. I'll tell you about a suggestion I've 
made for retaining unitary quantum time evolution throughout the "measurement" process 
and nevertheless avoiding superpositions of macroscopically different states. Although this 
resolution of the measurement problem tampers not at all with quantum mechanics, it plays 
havoc with the foundations of statistical mechanics. We are used to the idea that to calculate 
the future of a macroscopically defined system one assumes that all compatible microstates 
are present with equal probability. It is this assumption that is dropped, and I am hoping 
that the perspective you will have gained from the "time's arrows" part of these lectures will 
make this suspension less painful. Experimental tests of these ideas will also be discussed. 

I. IRREVERSIBILITY 

The problem is the apparent contradiction between microscopic reversibility and the 
macroscopic arrow. (Later I'll say why I don't think CP violation has any bearing on these 
issues.) 

A precise display of this contradiction comes from the Boltzmann H-Theorem. The 
theorem is based on the equation of motion of the quantity f(r,v,t), which is the number 
of particles, at time t, in the volume d 3 rd 3 v around (r, v), with r a point in coordinate space 
and v a point in velocity space. Boltzmann found that / satisfies the transport equation 

(J- t + Vl ■ v r + ^ • v Ul ) h = JdnJ d 3 v 2 a(n)\ Vl - vMf[ - f 2 h) . (1) 

The subscripts on / refer to its arguments. The left hand side describes the "flow" of / due 
the velocity of the particles and to any external forces, where F is an external force, m the 
molecular mass, and f\ = f(r,Vi,t). On the right is the change in / due to scattering. As 
for fi, the subscripts on the /'s in the integral are the subscripts on its argument v. The 
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quantity a(Q) is the center-of-mass scattering cross section for a pair of molecules whose 
relative velocity goes from v 2 — Vi to v' 2 — v[. The scattering term is derived by assuming 
that the scattering of particles 1 and 2 is independent of their previous history. This is 
the assumption of molecular chaos, the Stosszahlansatz. Let F = 0. Then we can take / 
independent of r. Define 



This is essentially the negative of an entropy. An immediate consequence of Eq. (1) is that 
^ < 0. This is the H-theorem and it establishes that entropy never decreases, even though 
the dynamics of the system is symmetric in time. (For a derivation of Eq. (1) and the proof 
of the last assertion, see Huang [2]. Why H is a kind of entropy will be discussed in Sec. ID 
below.) 

There are two famous problems connected with this result. There is the reversibility 
paradox, associated with the name Loschmidt, and which is basically the conundrum of 
deriving something asymmetric from symmetric assumptions. It can be phrased as follows: 
Let the system go forward a time t and suppose that H has actually decreased. Now reverse 
the velocity of every particle. On the one hand, this system has the same dynamics as 
before, just different initial conditions. So H should continue to decrease (or at least not 
increase). But the reversal of velocities causes the system to retrace its path, returning to a 
higher value of H (its initial condition). 

The other problem is the recurrence paradox, based on a theorem of Poincare. For a 
system in a finite volume having a potential that is not too wild, the surface of constant 
energy in phase space has finite volume. Under these conditions almost all initial states 
eventually return (arbitrarily closely) to themselves. This means that although for a while 
H may decrease, eventually it must increase. 



Marc Kac invented a lovely model suitable for demonstrating the above paradoxes. He 
also showed what it is you really can prove. To wit, although one particular system is 
not guaranteed to satisfy the H-theorem, for ensembles of systems (and with appropriate 
caveats) you really can prove the decrease of H, or increase of entropy. 

We will work through this model in detail. 

Consider a ring with N sites on it. See Fig. 1, where the sites are pictured as little cups. 
In every site there is a ball, which can be either black or white. On each time step, every 
ball moves counterclockwise to the next site. A subset of the sites are designated "active." 
Let there be A of these "active" sites (the symbol A will also designate the set) and let them 
have the following property: when a ball leaves an active site, it switches color, turning 
white if it was black, black if it was white. In the figure A = 25 and A = 5. 

Let the number of white balls in active sites at time t be WA(t), and let B A (t) be 
correspondingly defined. The dynamical scheme implies 




(2) 



A. The Kac ring model 



W(t + 1) = W{t)-W A (t) + B A (t) 
B(t + 1) = B(t)-B A (t) + W A (t). 



(3) 
(4) 



This is exact. Define ji = A/N, the active site fraction. For randomly distributed active 
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FIG. 1: Kac ring model. Balls move counterclockwise from site to site. Some sites are active and 
are marked at their bases. When a ball leaves an active site it changes color. Ball color is not 
shown. 

sites it is reasonable to assume that 

number of white balls in active sites number of active sites . . 
— (5) 

number of white balls everywhere number of all sites 

Thus WA(t)/W(t) = /j, and similarly for B. (This is the analog of Boltzmann's molecular 
chaos assumption.) With the relation (5), Eq. (4) becomes 

W(t + 1) =W(t)-n\W(t)-B(t)] (6) 
B(t + 1) =B(t)-n[B(t)-W(t)] (7) 

Subtracting the second equation from the first, the normalized difference on time step t + 1 
is found to be 

It follows that 5(t) = (1 — 2/j,yS(0). The system goes gray exponentially rapidly, with 
oscillation if /x > 1/2. 

To discuss the paradoxes I'll define an entropy for this system and show that it is mono- 
tonically increasing. The general principle for an entropy calculation (as we'll momentarily 
establish in Sec. I D) is that it is the logarithm of the number of microscopic states consistent 
with the given macroscopic state (N.B. the i/a, micro/macro, distinction!). This leads to 
two problems. One is traditional: counting. The other is seldom mentioned: how do you 
define "macroscopic"? Later we will go into this problem, but for now we will take the 
quantity just defined, 5, as the macroscopic variable. 

Entropy is thus given by 

^ = log[#microstates consistent with 5 (or W)\ = log ( ^ J , (8) 



where k is a constant related to the base of the logarithms, and often, for historical (and 
occasionally substantive [3]) reasons, given physical units. Using Stirling's approximation, 
this yields 

S 1+5 1+5 1-5 1-5 

kN=-— l ° g -2 2- l0g — ' (9) 



L S Schulman 



Time-related issues . 



SPhT & Technion lectures 



7 




FIG. 2: Entropy as a function of 5. 



which should be familiar as the information entropy (the amount of information missing, 
per ball, if all you know is 5). Cases: all white, S = 0; all gray, S = kN dog 2. 

To get a version of the H-theorem it is not necessary to differentiate S. The form of 
S as a function of 5 is shown in Fig. 2. The maximum is at 5 = 0. Moreover, \5\ is a 
monotonically decreasing function of t. It follows that S is monotonically increasing. This 
is our "H-theorem." 



1. The paradoxes 



The reversibility paradox makes use of a process very similar to the one just described: 
let the balls go clockwise and change color when entering a site. We could go through 
the arguments given previously and would again find that S(t) decreases in magnitude by 
the same factor, |1 — 2/x|. It follows that if we start, say, with all white balls and take T 
steps with the first process, followed by T steps with the second process, we should have 
5 = (1 — 2/i) 2T . However, the second process is precisely the inverse of the first. Therefore 
in fact 5 is again 1. 

The recurrence paradox arises from the simple observation that after 2N time steps every 
ball has been in every site exactly 2 times. It has therefore changed color an even number 
of times and the original configuration has been restored. Therefore 5 cannot have been 
monotonically decreasing. 



B. Resolution 



Make the proof more formal to see what's happening: 
Define 

{ — 1 if p £ A 
, p — 1, . . . , N site variables (10) 

1 if p f A 

{+1 if ball-p is white , Ar . „ ... , 1lX 

, p — 1, . . . , J\ ball variables (11) 
— 1 if ball-p is black 
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Then the equation of motion is 

V P (t) = e p _i?7p_i(£ - 1) , (12) 

which implies 

V P (t) = e P -ie P -2 • • • e p _ t ?7p_ t (0) . (13) 
For 5 we have (with some index shifts) 

^) = jf e ^(t) = ^ e n e ^ ^(°) • ( i4 ) 

P P 3=0 

The big step in resolving the paradoxes is to give up the idea of proving entropy increase 
for individual systems, and instead only try to establish the result for an ensemble of like 
systems. This is the cure prescribed in modern statistical mechanics. Let's see how it plays 
out for the ring model. 

Usually one takes ensembles of initial conditions, but here we'll take an ensemble of 
active-site locations. That is, we'll consider many rings on each of which the specific subset 
that constitutes A is a random variable. In his original notes [4] and in his book [5], Kac fixes 
the exact number of sites in A and averages over their distribution. This is a worthwhile 
exercise in statistical mechanics, but we'll take an easier path. We'll simply flip a biased 
coin for each site and with probability \i set it to be "active" . (This is analogous to using the 
grand canonical ensemble rather than the canonical ensemble.) The e's are thus independent 
identically distributed random variables ("i.i.d.") with expectation value 

(e) = {-\)n + (+l)(l -ft). (15) 

Then 

p 

Since the coin flips are not correlated the expectation of the product is the product of the 
expectations, and 

(^)>4&M = (l-2^(0). (17) 
p 

This is a real proof, but what about the paradoxes? 

Consider the recurrence paradox. After N steps, both e p and €n +p both appear in the 
product. They are the same site — and as far from being uncorrelated as you can get. So 
instead of (e p €N+ P ) = (1 — 2yu) 2 , you have (e p €N+ P ) = ((±1) 2 ) — 1- Thus, in doing the proof 
you must be careful that the absence-of-correlation assumption is truly justified. 

The reversibility paradox is handled similarly. 
Exercise: Write down the "equation of motion" for the inverse process and show how 
correlations enter when contemplating forward and backward motions. 



C. Does this explain the arrow of time? 

No, it does not. We've just seen in detail how a time-asymmetric result can be derived, 
but it only helps to sharpen the quandary over the source of our experiential arrow. On the 
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FIG. 3: Entropy as a function of time with initial conditions only, and for the same time-zero state 
working back to the state at earlier times. 



left side of Fig. 3 is displayed the entropy increase for a system for which 5(0) ~ 0.7. But if 
the same (time-0) ball-state is used to work back to negative times, entropy also increases, 
as is shown in the right-hand illustration in the same figure. This is exactly the conceptual 
point, in addition to the paradoxes, that one must confront in dealing with the Boltzmann 
H-theorem. (Recall that Boltzmann himself postulated an enormous fluctuation to account 
for the thermodynamic arrow [6].) 

Why then does S increase away from zero? It's simple. We fixed a low entropy (i.e., large 
\5\) state at time zero. This is a very unusual state, and when you do more or less anything 
to it, it becomes a less unusual state, i.e., its entropy increases. So the arrow arises only if 
you decide that time-0 is the beginning of your observations, which is something you usually 
decide because that's when you prepared your system. Prior to time-0 it was subject to 
other forces. So the source of the asymmetry is not the H-theorem, but is a consequence of 
your ability to prepare low entropy states as initial rather than final conditions. 

Incidentally this view of the second law carries to other situations. Namely, if you start 
the system in an unusual state, and if it explores the state space, then it is more likely to 
be found in a less unusual state. This already implies an increase in entropy, making the 
second law something of a tautology. It's not quite a tautology, since the whole subject of 
ergodic theory concerns itself with whether or not the system does explore the entire state 
space. But if it does, you already have much of the content of the second law. 



D. Information as — J^plogp 

Here is a review of the connection between information and "— J^plogp." The reference 
is the book of Katz [7]. 

Suppose you have m boxes and there is a ball in one of them, but you don't know which. 
The information you are lacking is a function of m: If m is 1, you know everything; if m 
is large you know little. Call the missing information I(m). Then 1(1) = and I(m) is 
monotonically increasing with m. 

Now suppose you have a rectangular array of boxes, m by n. The ball is in one of these 
mn boxes, and again you don't know which. So your missing information is I(mn). On the 
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other hand, you could be directed to the correct box if someone would tell you the row and 
column in this rectangular array, so the missing information is also I(m) + I(n). This leads 
to the fundamental relation 

I(mn) = I(m) + I(n) . (18) 

The solution to this functional equation can be found by assuming it holds for arbitrary real 
numbers, say I(xy) = I(x) + I(y), for x,y £ R + . Then differentiate with respect to y and 
set y = 1. Call k = which is nonnegative by the monotonic increasing property of /. 

Writing I'(x) = k/x leads to I(x) = klogx. 

Now we want to know the missing information if one has a probability distribution (know- 
ing that the ball is in one of m boxes is also a probability distribution, but a trivial one; 
1/m for each box). Suppose there are n possibilities, with probabilities p a , a — 1, ... ,n. To 
deal with this we suppose the numbers p a are rational numbers, with common denominator 
N' (which may be very large). In fact, we even take a large multiple of N', call it N, so 
that not only are the numbers Np a integers, but they are large integers. We now imagine 
N independent trials. Because N is so large, to a good approximation we can ignore fluctu- 
ations. Thus in Np ai trials we'll get the result a\, in Np a2 trials we'll get the result «2, etc. 
What then is the missing information? We don't know in what order these were selected. 
So the missing information is the logarithm of the number of ways in which we can get the 
above outcome. This is 

TV' 

w = tlos IOw' (19) 

By Stirling's approximation this is 



In trials — -Nk^Pa log Pa + O(logiV) . (20) 

a=l 

The missing entropy per trial is 1/JV times this: 

n 

I = -k^2p a \ogp a , (21) 

a=l 

a basic formula of information theory. 



E. Classical and quantum recurrence theorems 

1. The classical theorem. From [1]. 

The Poincare recurrence theorem is frugal in its assumptions, immediate in its proof and 
far reaching in its consequences. All you need is a measure-preserving transformation that 
is one-to-one and maps a set of positive, finite measure into itself under the action of the 
transformation. The cat map of Sec. IF satisfies this, and more importantly, so does the 
time evolution of a confined system in classical mechanics. 

Because dynamical systems often possess small subsets for which atypical behavior can 
occur (e.g., for the cat map (0,0) goes nowhere, while (0.4,0.2) is periodic) it is necessary 
to formulate the recurrence theorem using the language of measure theory. You can think 
of Lebesgue measure as ordinary length, area or volume, generalized so that you can talk 
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about the measure of sets denned through limit processes. Countable sets or sets of lower 
dimension (lines in the plane, planes in 3-space) have measure zero. A useful notion once you 
have Lebesgue measure is "almost always" or "almost everywhere," referring to statements 
that are true except possibly on a set of measure zero. 

The theorem states that almost all points return arbitrarily closely to their initial posi- 
tions. To state this more precisely, we make the following definitions: 

For each t, a continuous or discrete real parameter, let a transformation ip t map a space Q 
of finite measure into itself in a measure preserving way. That is, if \i is the measure and A 
a subset of f2, (j,(ip t (A)) = fi(A) for all t. 

Let ip t Lp s = Lp t+S . 

We further assume: y?(fi) C f2, < oo, and tp t has an inverse. 

It follows that if A C Q, then for almost all u G A, there are arbitrarily large values of t 
such that v?t(a>) G A. 

To prove this, pick a time r and let B be the subset of A that does not recur after a 
time r, i.e., for all t > r, tpt(B) D A — 0. We want to prove that B has measure zero. By 
definition, for all t > r, (fit(B) H B = Let $ = ip T ] it is sufficient to restrict discussion to 

powers of <3>. The proof examines the orbit of B, i.e., the set of points for n = 0, 1, 

By the hypothetical non-recurring property of B it follows (I'll show this in a moment) that 
the orbit does not intersect itself, so all those images of B are disjoint. But if they are 
disjoint their measure adds and if the measure of B were anything but the orbit would 
have infinite measure, which is impossible since that orbit is a subset of the finite-measure 
set, Q. The non-intersecting statement for the orbit is the following: for every n,m > and 
n 7^ m, Q n (B) H Q m (B) = 0. To see this, suppose it to be false and that the intersection is 
a non-empty set C. Let n < m. Apply $~™ to both sides to get B fl & m ~ n (B) = $~ n (C). 
Because [{C ^ 0) =3- ($~ n (C) ^ 0)], this would give us a point in B that recurs after time 
(m — n)r, contrary to hypothesis. 



Consider a system confined to a box and let it have wave function ^(0) at time 0. The 
quantum recurrence theorem states that the time-evolved wave function gets arbitrarily close 
to ^(O) at arbitrarily large times. That is, for any e and any T , there exists a T such that 



As for the classical Poincare recurrence theorem, the proof depends on the system's 
being confined to a finite region. (A free particle just keeps going.) For the classical case, 
the finiteness was expressed by the requirement that the phase space volume was finite. 
For quantum mechanics the discreteness of the spectrum does the job. 

We will prove the theorem with a trick — appeal to the classical theorem [8]. The idea 
is that when you truncate the modes of a confined quantum system, its dynamics becomes 
that of a finite collection of classical oscillators. Then use the classical Poincare recurrence 
theorem for those oscillators. 

Let the given initial wave function be ip(0), call the Hamiltonian H, let its eigenvalues be 
E n and the corresponding eigenf unctions u n . Then at time t the wave function is given by 



2. The quantum theorem. From [lj. 



\\*1>(T) - i;(0)\\ <e, with T>T . 




n 
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with r n = (u n \ip(0)). Without loss of generality, we define the phases of the u n so that the 
r n are real. We must prove that for any e > and T we can find T > T such that 

ll^)-^(0)|| 2 = 2^^(l-cos J E n t)<e 

n 

Since J2 n r n = 1, we can find iV such that 

oo oo 

2 r n(l - COS E n t) < 4 r "<l 

n=N+l n=N+l 

The foregoing holds uniformly in t. It remains to show that there is a T > T such that 

N 

2^r 2 t (l- cos E n T)<j (22) 

n=l 

This part of the proof is usually accomplished by quoting theorems about almost periodic 
functions. However, the sum in Eq. (22) can immediately be related to the motion of N 
harmonic oscillators — classical ones — and then we will be able to use the classical recurrence 
theorem. The idea is simple, but technicalities require a plague of e's. 
For the details of this plague consult [1]. 

The idea is that if you take N oscillators with frequencies E n , then the sum in Eq. (22) 
compares their configuration at time-0 with that of a later time (T). The classical recurrence 
theorem states that there will be times T (later than any T you care to give) for which the 
oscillators are all close to having exp(iE n T) be one. For these times the left-hand side of 
Eq. (22) is small and the theorem is proved. 

Remark: Quantum recurrence allows no exceptions, not even as infrequent as the measure 
zero initial conditions of classical mechanics. 

Remark: Note where the finite volume restriction played a role. It was essential to Eq. (22); 
discreteness of the spectrum allowed truncation to a finite sum. 

Remark: It would be interesting to see whether things could be turned round and the 
Poincare recurrence theorem used to establish results in the theory of almost periodic func- 
tions. 



F. The Cat Map 

This is a richer dynamical system, a bit closer to classical mechanics. The universe — a 

mock phase space — is the unit square, I 2 . Points are designated (x,y), < x < 1, < y < 1, 
and the transformation is 

x' = x + y mod 1 (23) 

y' =x + 2y mod 1 . (24) 

Writing £ = ( y ) ' ^ e ^ rans f° rma ^ on can ^ e alternatively expressed as 



i' = 0(0 = Mi mod 1 , with M = ( j g ) ■ 



(25) 
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FIG. 4: Cat map for points initially near the origin. 



The effect of this map on a number of points initially in a small rectangle near the origin 
is shown in Fig. 4. Evidentally, if one thinks of these points as gas molecules, this system 
comes rapidly to equilibrium. 

The reason for this equilibration can be seen in the spectrum of M. Its eigenvalues are 

\ _ „ e± o. 962 „ r 2.618 



with eigenvectors [9] 



1 



v± = const • ^ - 1 J ' ^ 

By virtue of the large Lyapunov coefficient (~ 0.96) points along the direction v + are 
stretched by a factor ~ 2.6, while those orthogonal to this direction (and along are 
squeezed together (keeping constant area, since detM = 1). The action of on a rectangle 
is shown in Fig. 5. 

And finally, I would be remiss without an actual feline illustration, Fig. 6. 

We wish a precise statement characterizing the apparent fact that this system comes 
to equilibrium. To do this, we consider a collection of points in I 2 to be an ideal gas of 
distinguishable particles and we coarse grain I 2 by dividing it into G rectangles. The only 
information retained after the coarse graining is the number of particles in each rectangle. 
The missing information is the way the particles have been distributed, n^n-i, ■ ■ ■ ,na, with 
Yli n k — N , the total number of particles. Using Stirling's approximation for the appropriate 
combinatorial coefficient, this gives for the entropy 

S G n 

-j^ = -^Pk logpfe with p k = . (28) 
k=i 
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FIG. 5: Action of the cat map on a rectangle. 
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Number of points: 500. Coarse grains: 10 by 10 
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FIG. 7: Entropy as a function of time for the cat map. 



(Note: this is not the Kolmogorov entropy.) 

Generally speaking, entropy increases when energy sequestered in a single macroscopic 
degree of freedom (or a small number) gets spread among many. For our coarse graining, 
"macroscopic" means that you can distinguish the various rectangles, but can't do better 
than that. So if all the particles (and their energy) go from being in a single rectangle to 
being spread among many, the entropy has increased. This is shown in Fig. 7. 



1. Criteria for equilibration 

A characteristic of a dynamical system that indicates a fairly strong tendency to equi- 
librate is ergodicity. In the mathematical literature this property comes in many flavors, 
strong, weak, etc. But I won't aim for this level of precision. 

Generally, ergodicity means phase space averages are equal to time averages. For / a 
function on Q, the phase space, the phase space average is 

V(f) = [ /H^M (29) 
Jn 

The time average needs another argument, namely the initial point on which the time 
transformation {(fit) began to act. Thus 

T(f,u>) = lim 1 f /(&(«))*. (30) 

To say that (fit is ergodic means that V(f) = T(f,cu), independent of the initial point, uj. 
This turns out to be equivalent to the assertion that any function invariant under (fi t is a 
constant on f2. See [1] for the proof. (For classical mechanics this constant would be the 
energy.) 

The cat map is even better at coming to equilibrium than merely being ergodic. It is 
what is called "mixing." A transformation is mixing if for two (measurable) subsets A and 
B of Q, 

lim»(<j> t (A)nB))=»(A)»(B), (31) 

t— >oo 
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where, as usual, we take = 1. This represents a kind of statistical independence. 

Compare it to the formula for conditional probabilities, 

P^-^. (32, 

where C and D are "events," i.e., subsets of Q. When they are independent, Pr(C D D) — 
Pr(C) Pr(D), so that in that case, Pv(C\D) = Pr(C). Belonging to D gives no information. 

To be more explicit, identify probability (Pr) and measure (/i). Then if <f> is mixing, 
Pr(&(w) e A\ oo e B) = fi{<j> t {B) n A)/<j>{B) -> 

Exercise: Show that mixing implies ergodicity. Hint: use the fact that there is only one 
constant invariant under <p t for an ergodic system. (You might want to prove that too.) 
See [1]. 

G. The cat map is mixing 

This is another digression, but it happens that it does not take much machinery to really 
prove the mixing property for the cat map, leading to one less piece of wisdom that you are 
expected to take on faith. I also like this proof because you see that the Fibonacci numbers 
appear in the cat map story. (Hint: A + is exactly 1 more than the Golden Ratio.) 

We use the notation, M, £ = ( X ), 0, etc. of Sec. IF. For measurable sets A, B C Q, 



y 

and 4>t = 0* for integer t, we want to prove Eq. (31). Define the characteristic function 

"®-{; m a (33) 

Then what we need to show is 

hm J Xma)(Oxb(0 = J Xa(0 J Xb(0 (34) 

(suppressing the d/j). The proof uses Fourier analysis and we define the functions 

e pq (x, y) = e 2 ** , p, q integers. (35) 

We also use the notation v = ^ ^ ^ , so that these functions can be written e J/ (£). Then for 
a measurable function / on the unit square one has 

f(x, V) = Y1 fpi e P<i( x > V) = Y1 f» e »(0 (36) 



p,q 



with f u = f l2 d*uxoem- 

The essence of the proof lies in the observation 

/(0(O) = E f^MO) = Y,f» exp ( 2ni W x +y ^ + q{ ~ x + 2y ^ ( 3? ) 

V V 

= J2f- ex p ( 2 ™[(p + q) x + (p + 2( i)y^ ( 38 ) 

V 

= f» e <t>(")(0 = U-Hv) e v(0 ■ (39) 
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Note that in acting on u, does not invoke the mod 1 operation and is only multiplication 
by the matrix M, or M _1 for The inverse of M is 

(4°) 

which has the same eigenvalues as M. Let u± be the eigenvectors of M~ l with eigenvalues 
A± (same A's as before, i.e., A + > 1). Then we can write 

=au+ + bu- , (41) 

so that the action of 0" 1 on v is simply multiplication of the appropriate coefficient by the 
appropriate eigenvalue. 

What makes the proof work is that for all u, except (0,0), 0~*(^) eventually becomes 
large. This is because of the (irrational) V~5 that appears in A and u. Because v consists of 
integers, both a and b must be nonzero. 

The functions to which we apply these results are the characteristic functions defined 
earlier, 

Xa(0 = a " e ^ and Xb(0 = Ke ^) • (42) 

V V 

Note that do = f Xa = fJ>(A), and similarly for B. Since 

£ e <f){A) & _1 (O e A, (43) 

X4>-HA){0 = = ^Ovevty- 1 ®) = J2 a Hv) e »(0 > ( 44 ) 

where the last step follows as in the demonstration going the other way (0 rather than _1 ). 
What we need to show is that 

At = J X M a)(0xb(0 ~ J Xa(0 J Xb(0 - (45) 

for t — > oo. This can be written 

In the usual spirit of convergence proofs, we imagine that some small e has been given and 
we must show that |A t | is less than e for large enough t. To do this, truncate the sum 
over n so that is less than some number R with error (in the sum) less than e. By the 
convergence of the Fourier expansion for the x's such an R exists. = >Jp 2 + q 2 .) 

By the orthogonality of the Fourier expansion functions, the only non-vanishing terms in 
the expansion Eq. (46) (now truncated to ||/i|| < R) are those for which fi + 4>t{v) = 0. But 
except for (0,0), for sufficiently large t all 0t(z/)'s escape the circle and the sum vanishes. 
As for the (0, 0) term, it is cancelled by a &o- 

Note that the powers of M give the Fibonacci numbers, as do the powers of M — 1. The 
larger eigenvalue of M — 1 is the famous Golden ratio, (1 + v / 5)/2. 
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FIG. 8: Arena for quantum decay. 



II. QUANTUM IRREVERSIBILITY 

Let a particle be subject to the Hamiltonian 



H 



+ XS(X) + Kail , 



(47) 



where V wM (x) is oo for x < — 1 and zero otherwise. (See Fig. 8.) This has no bound 
states. Initially we take the wave function to be ip(x,t) = — ©(— x)y/2 sin(a:7r), and see what 
becomes of it. 

To expand ip in eigenstates we solve the time independent problem, whose solution takes 
the form 

Asmk(x + l) if x < 

Be ikx jf x > o 



il>(x) 



(48) 



where E = k 2 /2. At x = we demand continuity of ip and 



V>'(0 + ) - V'(O-) = 2A^(0) 



This implies 



k i- 



tan k 



2A. 



(49) 



(50) 



It is clear that this forces k to be complex. For large A there will be metastable states with 
k ~ nir. We focus on n = 1 and set k = tc + z with z small. The equation satisfied by z is 



2A 

71 + Z 



= % 



tan z 



(51) 



It's a bit tricky to get the imaginary part of z here since it turns out to be 0(1/A 2 ). In 
any case, after a bit of algebra one obtains that Im.E = — 7r 2 /A 2 . There is thus a decay rate 
T = 2tc 2 /\ 2 . This system shows irreversibility [10]. 
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Note that Imk < 0. This has an interesting implication for the wave function: 



\1>(x)\ 



~ e 



i(Re fc+ilm k)x I — xlmfc 



+00 for X 



OO . 



(52) 



What's going on? This behavior of ip emphasizes the inappropriateness of looking at a 
steady-state solution for the decay problem. We're pretending that this system has been 
decaying forever. Therefore no matter how small T is there was a time in the distant past 
when there was a lot more probability in the well. The large x divergence of the wave 
function thus reflects amplitude that decayed long ago, when there was more amplitude in 
the interval [—1,0]. 

Note too where the arrow comes from in this problem. It arises from the outgoing wave 
boundary condition. We have said, "This is the initial condition: all the amplitude is found 
in the well at t — and we will compute what will happen for t > 0." 

Remark: Recognition of the fact that you cannot evolve arbitrarily far into the past is a 
feature common to unstable particles. But in that case why is that they have mass and spin 
as good quantum numbers? For stable particles we have Wigner's analysis of the irreducible 
representations of the Poincare group. The labels of these irreducible representation are 
spin and mass, justifying the use of those quantities for the characterizing of particles. 
For unstable particles all you have is a semigroup. It turns out that you can analyze the 
irreducible representations of this structure, called the Poincare semigroup, as well, and 
can arrive at the same quantum numbers (plus some other representations that you don't 
get in the stable case). See [11]. This approach has recently been used as a way to more 
reliably assign masses and widths to unstable particles [12]. It also generates some interesting 
mathematics connected with finding the states on which this representation acts [13]. See 
also [14]. 

To gain perspective on this system's arrow of time, imagine that instead of allowing the 
particle to escape to infinity we put a hard wall at some distant point b (b ^> 1). We now 
have a particle in a box, slightly elaborated by having a bump at zero. So we know there is 
quantum recurrence. Let us estimate the recurrence time. 

One can obtain the eigenfunctions of the Hamiltonian (now true, normalizable Hilbert 
space vectors) by taking the wave function in the interval [0, b] of the form sin(/cx + 0), but 
I don't want to go through that exercise. Let the eigenstates of the system, whatever they 
are, be called u n , with energies E n . These energies are roughly E n | (?) n 2 . So the 
density of states ( u dn/ 'dE n ) is essentially b 2 /n or b/y/E. Now start with the same initial 
wave function as before, the metastable state that looks like the ground state of a particle 
in a box at [—1,0]. We express this wave function in terms of the {u n }, 



for some constants, c n . For the time-t state we multiply each term by exp(— iE n t). For 
recurrence (as in Sec. IE 2) examine the difference 




(53) 



n 
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a = U( x ,t) - ^(x, o)\\ 2 = Yl I "! 2 ! 1 - ex p(-^)l 2 = 4 E l c «l 2 sin2 (^r) ■ ( 54 ) 

n n ^ ' 

To make this small we need to look at the range of n for which c n is not small and estimate 
the probability that all terms sinE n t/2 are close zero. We will assume that t is so large that 
the numbers E n t/2n modulo 1 can be considered uniformly distributed on the unit interval. 
(This is also an assumption about the spectrum, which in a rigorous proof would have to be 
justified.) 

Suppose we want A smaller than some e. We'll use half of e to get rid of contributions 
outside the range of important n's. This range can be approximated as the width of this 
state considered as a resonance, namely the V calculated above (this is not a statement 
about sin7rx, which does not depend on A, but about the {u n }, which do). The number 
of states in the important range of n's is thus N = T ■ [density of states] = Tdn/dE. For 
simplicity we further assume that the c n 's do not vary much in this range, so that each |c„| 2 
is of order 1/N. Therefore to get the sum in Eq. (54) to be less than e/2 we want each term 
sin 2 (%^) be less than e/8. This in turn requires that every one of the numbers (E n t/2 mod 

/ \ Tdn/dE 

1) lie in an interval of length 2a/c/8. The probability of this occurring is ( ye/2 
Now dropping all order unity constants (which I should have done long ago) and using the 
estimate above for dn/dE, this gives e Tb /^, or a recurrence time that grows like l/e b . 
Exercise: Show how with b < oo the initial packet exits from the confining region. After a 
short time interval this should look like exponential decay. 

A second perspective on quantum decay shows other ways in which irreversibility enters. 
Perhaps the simplest model for quantum decay uses the following Hamiltonian and state 
function 



V 



Q =diag(^i, . . . ,ujn) (an N-by-N diagonal matrix with u k on the diagonal), Y and C are 
TV-component column vectors, and v and x numbers (so H is (N + l)-by-(iV + 1)). This is 
a model for a single level in contact with iV levels and could arise in many contexts (e.g., 
the Jaynes-Cummings model). Our initial conditions will be that the single level is excited, 
i.e., x(0) = 1, Y(0) = 0. 

The time-dependent Schrodinger equation (with h = 1) is 

x = -i vx - iC^Y , Y = -iCx - ittY . (56) 

With the stipulated initial conditions, the second equation can be integrated to yield 

—iClt I AQ.s / 



Y(t) = -ie- im e tUs Cx(s)ds. (57) 
Jo 

This is substituted into the first equation to give a renewal equation for x 



x 



(t) = -ivx(t) - C^e- int f e ms Cx{s)ds (58) 

Jo 
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For convenience define 

z(t) = e ivt x{t) (59) 

which produces a quantity that would be entirely stationary, but for the interaction. The 
equation satisfied by z is 

z(t) = - [ e lut C ] e- in{t - s) Ce- lus z(s)ds 
t 

= - [ e lvt K{t - s)e- lvs z{s)ds with K(u) = tfe~ inu C (60) 
Jo 

Everything is still exact. The function, K(u), appearing in Eq. (60) with argument t — s, is 
of central importance. It is the product of three operators. First C acts on x (cf. Eq. (56)) 
and brings amplitude into the subspace of decayed states. The amplitude in each mode then 
evolves for a time u, each mode with its own phase (from Q). Next sums the amplitude 
(the matrix product produces a scalar) and sends it back to the undecayed subspace. Here's 
the point: to the extent that the modes in the decay subspace get out of phase, the sum 
going back into the undecayed subspace is reduced. In this way, K acts so as to shrink the 
norm in the one-dimensional subspace. The reduction will depend on two factors, u (larger 
u allows the modes to get more out of phase) and the range of Q over which C is large. It 
is helpful to write K{u) explicitly. The components of C will be denoted c k . K becomes 

K{u) = J2\ck\ 2 e-^ kU (61) 

k 

Since N < oo we know that this system recurs (by the quantum recurrence theorem). It 
follows that the continuum approximation for the levels (which we are about to make) is 
where the mischief of irreversibility lies. 

In the continuum limit the quantities c& are of order l/y/N, leading to the definition 
7 (cj) = Ck\fN ■ To replace the sum over k by an integral over energy (u, formerly Uk), we 
require the density of states p(oo) = l/[N(uo k+ i — uo k )} (so the iV's cancel). In the limit, K 
becomes 

K{u) = J dcu p(cu)\-f(uj)\ 2 e- iuJU . (62) 

What happens next depends on the physics. Generally speaking K{u) drops off with in- 
creasing \u\. Thus if the coupling of this level to the (now) continuum drops off, then by the 
Riemann-Lebesgue lemma K will go to zero. If 7 has some significant range AE, then the 
spread in K will be 1/AE (assuming p does not cause mischief). All these properties are 
what make exponential decay ubiquitous. Having K drop off on some time scale At means 
that the system forgets the details of its past (cf. Eq. (60)), so that on larger time scales the 
decay is Markovian. 

Remark: For a single degree of freedom, being Markovian implies exponential decay. With- 
out memory, the next 1 second is like the last 1 second. If the system shrunk by a factor 
e -7 (7 > 0) in the last second, what's left will again shrink by e~ 7 . 

With K in continuum form the equation for z becomes (with the change of variable 
u = t — s in the integral) [15] 

z(t) = - J dujp{uj) | 7 (w) | 2 J due i{v ' w)u z{t - u) . (63) 



L S Schulman 



Time-related issues . 



SPhT & Technion lectures 



22 



Let us now assume that z does not change much on the time scale for which K drops to 
zero. Then we take it out of the integral, allowing the u integral to be done explicitly. As 
in Eq. (61), we require \mu > for convergence, so that our equation becomes 

z(t) = -iz(t) /^ / Ml7M ' 2 . (64) 

J U) — V 



To evaluate this we recall the formula 

-J— = V- T^S(x) (65) 
x ± le x 

with V the principal value and it is understood that Eq. (65) is to be used in an integral 
over x. Note that this formula does not require analyticity of the function integrated and 
can be justified by multiplying numerator and denominator by x =F «£■ As indicated, ui has 
a positive imaginary part, so that Eq. (64) becomes 



z(t) = -iz(t) 



J U) — V 



(66) 



= —iz(t)AE - z(t)F/2 , (67) 

with AE and T defined by Eq. (66). We have thus derived exponential decay with a decay 
rate given by Fermi's Golden Rule. There is also an energy shift. 

Where did the arrow of time enter? By now you are sensitive to the introduction of 
infinities as the source of mischief, and as usual it is the continuum limit that gives an arrow. 
We took Imcj > to be sure K (as expressed in Eq. (61)) converges, meaning converges as 
u — > +oo. There is an initial state at time zero and the expectation of reasonable behavior 
for the distant future, with "future" have implicitly a particular sign of t. Prior to the 
continuum limit there was both no need to introduce imaginary parts and there was the 
prospect of ultimate recurrence. 

Remark: The quantum Zeno effect. Return to Eq. (60). Note that as t \ 0, z goes to zero. 
So although an undisturbed system ultimately has exponential decay (with \z/z\ = T/2 > 0), 
things start out slowly. Suppose one would "check" at a very early time whether the system 
is still in its original state, than the probability of finding a change would be less than 0(dt). 
This is the basis of the quantum Zeno effect (QZE). You repeatedly measure, or project, 
at sufficiently short intervals dt, and in this way prevent decay altogether. A watched pot 
never boils! There is a considerable literature on this subject. An early discoverer of this 
effect (it seems it was discovered several times independently) was L. Khalfin. One recent 
theoretical paper is [16]. The effect has also been seen experimentally in [17]. The subject 
of continuous observation (raised during the lecture) arises because it seems at first sight 
that if one were to stare at a sample of radium (or maybe hold a Geiger counter to it) it 
should not decay. The reason that it does decay is that all such measurement devices, Geiger 
counters, eyeballs, etc., have a time scale. If this time scale is shorter than the time interval 
needed to produce the QZE, then indeed decay is halted. One can build one's own internal 
observer by extending the Hamiltonian of Eq. (55) slightly: 

/ v $t o \ 

H= $ u 6t . (68) 
\ W ) 

W, like Q, is diagonal and represents levels to which the system can decay after it has 
reached the first second quasi-continuum. So these levels "observe" the transition. If the 
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coupling 6 is very strong then "as soon" as the decay takes place it is noticed. This stops 
the decay There is also an energy-level interpretation. See [18]. 

Remark: Again I refer to Eq. (60). This time I note that the function p|7| 2 is bounded 
from below, that is, below some E there are no levels. By general theorems of the Fourier 
transform, this implies that the asymptotics of K with u cannot be exponential. In practice, 
depending on the threshold form of the function the asymptotic decay behavior must be 
power law, t to some negative power. This too was predicted by Khalfin. This has never 
been observed experimentally, and is expected to be difficult to see [19]. 



III. STOCHASTIC DYNAMICS: IRREVERSIBILITY FROM COARSE 

GRAINING 

In this Section I will introduce information-theory concepts that tie in directly to irre- 
versibility. It is interesting that the proofs of the essential relations go through with few 
assumptions, both for classical mechanics and for quantum mechanics. One feature that 
will be important is the question of what is a good "coarse graining." This will also come 
up later in these lectures, and the present discussion will introduce both the issues and the 
formalism. 

First, classical mechanics. We work on phase space, fl Let /x be the measure on Q and 
/i(f2) = 1. Let the dynamics be given by a measure-preserving map <f>^ on Q, with <f>^(u) 
the time-t image of an initial point w£fi. The time parameter, t, may be either continuous 
or discrete. For simplicity we fix a time t, and call the associated phase space map, <fi. The 
notion of "macroscopic" is provided by a coarse graining on Q. This is a finite collection 
of sets of strictly positive measure that cover Q: {A a }, a = 1,...,G, with U a A a = f2, 
A a fl A^ = for a ^ (3. Let Xa be the characteristic function of A a and let v a = /i(A Q ). If 
/ is a function on Q, its coarse graining is defined to be 

= J2 ^-fa , with f a =f dfiXcMttu) ■ (69) 

a 

Note that f a /v a is the average of / on A a , and that j n f = fn f- The function / has 
been replaced by a step function that is constant on the grains, in such a way that its total 
integral has been preserved. Eq. (69) can also be written 

/M = £ f dvL(u/)XaW)f(<S) ■ (70) 

J 

a 

Let the system's distribution in VL be described by a density function p{uj). We take the 
primitive entropy to be 

Sprim = - / p(v) log(p(w)) dfj, . (71) 
Jn 

Note that this differs by a (possibly infinite) constant from entropy defined on a discrete 
set — see Eq. (21). This is because discretizing phase space would give — p Ax \og(pAx) as 
the usual information entropy. S in Eq. (71) differs from this by log Aa; and as a result may 
be positive or negative. By virtue of the fact that 0^ is measure preserving and invertible, 
S m \ m is constant in time. 
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The entropy that I will show to be non-decreasing is the coarse-grained entropy, and is 
defined as 



S(p) = S prim (p) = - plogpdp. (72) 
Jn 

with p formed from p as in Eq. (69). By the definitions, 



S(p) = ~ [ dp(u) -X«M log ( E -X/3H ) (73) 



(74) 

= S(p a \v a ), (75) 

where the second line follows since there is no contribution unless a = (3; in that case all 
one gets is integration over particular grains. On each grain p is constant and the integral 
over Xa yields v a , cancelling the one inside the sum. 
The function S(p\q) is the relative entropy defined by 

S(p\q) = -J2p(^og(^) , (76) 

where p and q are probability distributions such that q(x) vanishes only if p(x) does. S(p\q) 
is also known as the Kullback-Leibler distance. Note that Yl Pa = J P = 1, and that all 
v a > 0. We will be use two properties of relative entropy that I will not prove: 

S(p\q) < (77) 
S(p\q) < S(Rp\Rq) (78) 

where R in Eq. (78) is a stochastic matrix. A stochastic matrix has the following two 
essential properties: R xy > and J2 x R xy = 1, for all y (and x and y are the matrix indices). 
The significance of such a matrix is that it represents the probability of a transition from y 
to x in unit time for a Markov process on a set X, with x,y G X. (Some authors define R 
the other way round.) These properties are proved in [20, 21] and also in [22]. 

What I will now show is that ^(^(p)) > 5*(p). In other words, if you take the coarse- 
grained distribution function forward in time and again coarse grain it, entropy increases. 

We already know S(p), it is just S(p a \v a ) (there is a slight abuse of notation in using a 
in both arguments of S, but let's not get fussy). We next examine the coarse graining of 
the time image of p. To avoid enormous "hat" symbols, I write a = [a] • 

E k«(*M)U ? (79) 

oc 

a 

E E — I dpWxMXcMV)) (80) 

Xp{u) V (A/3 n 0' 1 (A a )) 

< 82 > 



[to 



L S Schulman • Time-related issues ... • SPhT & Technion lectures 
TABLE I: Classical-quantum correspondence for the entropy increase proposition. 



25 



classical 


quantum 


classical 


quantum 


n 


Ti (Hilbert space) 


P 


p (density matrix) 


A a 


7i a (subspace) 


P 


Trace 


Xa 


P a (projector) 




dimension of H a 




u t 







where Eq. (80) uses Eq. (70). Eq. (82) defines the new coefficients p' which we can write as 

a 

with 

Rlia s " (A, n (84) 

The proof can now be easily completed by noting first that R is stochastic and second 
that it has eigenfunction v a with eigenvalue unity. The proof of stochasticity follows by 
adding all (3 and, since U^A^ = Q, the result is one. The eigenvector property is similarly 
immediate. It follows that the entropy of [</>(p)] cg is S(p'\v a ) = S(Rp\Rv a ) > S(p\v). 

It is remarkable that this proof goes through for any partition. In particular, you might 
have thought that the grains through which a system passes as it moves toward equilibrium 
must get larger and larger. This would be in keeping with the idea that the system is really 
in some single grain and its entropy is the logarithm of that grain's volume. That may 
happen but it need not, since entropy increase occurs through uncertainty over which grain 
it's actually in. Put otherwise, the increase occurs by virtue of the density's spreading to 
many grains, irrespective of their individual sizes. 

Where then does the physics enter? It is in the assertion that it is physically meaningful 
to coarse grain the image of the coarse-grained p. The implicit assumption is that within 
each grain the phase space points have spread uniformly. Thus t cannot be arbitrarily small 
and must exceed a microscopic relaxation time associated with the coarse grains. 
Exercise: This can also be phrased in terms of the probabilities that the 0-image of a subset 
of A Q goes to the grains in the image of A a in the same proportion that A a itself does. 

When I first worked out this little calculation I thought that for the quantum case things 
would be more complicated, with all sorts of phases entering and giving the desired result 
only if they miraculously cancelled. This turns out not to be the case and the quantum 
version is as straightforward as its classical counterpart. 

I will not go through the details. One only needs a dictionary, as given in Table I. Thus, 

p 

p = y^ —p a , with p a = TrP a p and v a = TrP a . (85) 

Entropy is again S(p a \v a ). (The v's no longer sum to unity, but this makes no difference.) 
Time evolution is given by a unitary operator, U t , acting in the usual way: p{t) = U t p(0)U}. 
Carrying through the same steps as for the classical case, coarse graining, evolving in time 
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One p.m. 



Two p.m. 



Three p.m. 



FIG. 9: Three images of a glass of water with an ice cube in it. 



and coarse graining again, leads to the same equations, but with the matrix U R" now given 



Stochasticity of R is readily established and entropy non-decrease follows as above. 
Exercise: Check that R is real. 

IV. INITIAL CONDITIONS VS. FINAL CONDITIONS: A RESTATEMENT OF 

THE SECOND LAW 

Given the present-time macrostate of a system, the second law of thermodynamics can be 
phrased as a distinction between the way we compute the later state of the system, and the 
way we estimate its earlier state. These calculations are called prediction and retrodiction. 

Consider the situation pictured in Fig. 9. I show three images of a glass of water with an 
ice cube in it. Suppose you are told that at 2 p.m. the water is at temperature T 2 and that 
the ice cube volume is V 2 , as shown in the center image. The system is isolated between 
2 p.m. and 3 p.m. (the walls of the larger container are not shown) and you are asked to 
predict the water temperature and ice cube size at 3 p.m.. 

Your prediction should look like Fig. 9c: a smaller ice cube and colder water, V3 < V2, 
and T 3 < T 2 . In principle, how did you do your calculation? The middle picture (9b) is 
a macroscopic description and has an associated a region of phase space, call it T 2 . Every 
microstate 7 6 T 2 looks like Fig. 9b. To make your 3 p.m. prediction, you evolve each 7 
forward by one hour. The vast majority will look like Fig. 9c, and this is your predicted 
macrostate. Explicitly stated: you average over the microscopic states of T 2 , and give them 
all the same weight. This reflects a fundamental principle of statistical mechanics. For 
example, for an isolated system (as this one is) the partition function sums equally over all 
states. 

This procedure can be described in phase space. In Fig. 10, the oval in the middle 
represents the phase space region T 2 . From T 2 all microstates begin at 2 p.m.. Let the 
operator mapping phase space points forward one hour be tp. The statement about equal 
a priori probabilities at 2 p.m. is the statement that our 2 p.m. phase space measure is 
the characteristic function of T 2 . By 3 p.m. the system has evolved to y?(r 2 ). (See Fig. 
10.) This is no longer a macroscopic state. The image of a coarse grain need not be a 
coarse grain. The typical situation is that (f(T 2 ) falls into one or several coarse grains at the 
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FIG. 10: Phase space. Schematic illustration of r 2 , ^(^2) (the dynamical image of r 2 after 1 
hour), Ti, and f(Ti). 

later time. Sometimes the image grains will be far apart, for example for the phase space 
description of a person throwing dice. The objective of a dice throw is to make what looks 
like a single macroscopic initial state go into several distinct macroscopic final states. Yet 
another feature of y?(r 2 ) is that small portions of it do not look like the right-hand figure at 
all. This will be clear later. 

The macroscopic state associated with the right-hand picture in Fig. 9 is some subset 
of phase space, call it T 3 . (This is not pictured in Fig. 10.) The conclusion to be drawn 
from the last paragraph is that, to a good approximation, y?(r 2 ) C T 3 . What a "good 
approximation" means is that n((p(T 2 ) — T 3 ) <C /i(r 2 ), where fi is the phase space measure. 
So T 2 has found its way into r 3 , but how much of T 3 does it occupy? The answer is, 
very little. The phase space volume T 2 has been stretched, whirled and tendrilled so as to 
occupy T 3 , all the while maintaining its original phase space volume /i(r 2 ). This connects 
our discussion to the second law of thermodynamics. The entropy at 2 p.m. is the logarithm 
of the volume of r 2 . The entropy at 3 p.m. is the logarithm of /x(r 3 ). Therefore for a glass 
of normal size the ratio of the phase space volumes will be on the order of exp(iV A ), with 
N A Avogadro's number. So it is the fact that a coarse grain at one time spreads into many 
coarse grains, or larger coarse grains, at later times that corresponds to the second law of 
thermodynamics. Implicit is the assumption that the earlier grain was uniformly occupied, 
otherwise you couldn't be sure that all the regions at the later time were also occupied. 

Now a more subtle question: You have additional information: the system was isolated 
from 1 p.m. to 3 p.m., not just from 2 p.m.. What was the system state at 1 p.m.? One way 
to answer would be to apply if -1 to T 2 and identify the coarse grains needed to contain most 
of v?~ 1 (r 2 ). Since ip and y? -1 are the same, or essentially the same, the 1 p.m. macrostate of 
the system would be the smaller cube shown on the left of Fig. 9 (along with colder water). 

Of course it doesn't happen that way. Had you started with a small cube at 1, you would 
have an even smaller one at 2. Obviously there is a different rule for retrodicting. What is 
it? 

There is no unique answer. Here is one way. Make a hypothesis, about the 1 p.m. 
macrostate. This corresponds to some set r x in phase space (illustrated on the left of Fig. 
10). Propagate it forward one hour to get (p(Ti). If this lies substantially within T 2 (as 
it does in Fig. 10), then your hypothesis was acceptable. For example, the larger cube on 
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the left of Fig. 9, might be a precursor to the middle picture. But other precursors are also 
possible. For example, you might have had a smaller cube plus a little shaving of ice that 
melted entirely during the hour. The coarseness of your coarse grains may preclude your 
being able to discriminate between these cases. You may even have several hypotheses and 
want to use other information to weight your guesses. For example you may know that all 
available ice came from a machine whose maximum cube size is smaller than the large one 
pictured on the left, implying that there had to be some small chips at 1 p.m. [23]. 

You do know however that the combined phase space volume of all acceptable hypotheses 
is vastly smaller than /x(r 2 ). This is because most of the volume in T 2 , when propagated 
back to 1 p.m., gives the single smaller ice cube, the one with the question mark in it in 
the figure. So if F 1 is taken to represent the volume of all acceptable 1 p.m. phase space 
volumes, fi(Ti) <C /i(r 2 ). 

I summarize: To predict, take — with equal weight — all microstates consistent with the 
macrostate and propagate them forward in time. Then coarse grain to get your macroscopic 
prediction. To retrodict, go back to the earlier time and make macroscopic hypotheses. Use 
each such hypothetical macrostate to predict the present state. If it agrees, it was a possible 
precursor. In general, the phase space volume of all such precursors is less than or equal to 
that of the subsequent macrostate. In this way there is entropy non-decrease for an isolated 
system when moving forward in time. These different rules for prediction and retrodiction 
are an alternative statement of the second law of thermodynamics. 

Here is yet another statement of the past /future distinction. Suppose you know the 2 
p.m. macrostate and are asked what you know about its microstates. If all you are concerned 
with is prediction, then you say, all microstates are equally likely. But if you happen to know 
that the system had been isolated since 1 p.m., then the vast majority of those microstates 
are rejected and all that are allowed are those in the set (p(Ti), with Ti your collection of 
acceptable hypotheses. In other words, for an initial state you treat all microstates equally, 
but if something is a final state the vast majority of its microstates are rejected. 

The foregoing observation shows that the assumptions of statistical mechanics are even 
stronger than was already stated. One way to make a prediction is to give equal a priori 
probability to all initial microstates. But that's not the only way. For our system isolated 
from 1 p.m. to 3 p.m., the set of 2 p.m. microstates is now known not to be T 2 , but (p(Ti), 
a much smaller set. That means that our 3 p.m. macrostate need not be calculated from 
9?(r 2 ), but from tp(tp(Ti)). Does anyone think the 3 p.m. result will be any different? All 
experience says otherwise. It does not matter whether you use all of T 2 or only the part 
that results from a state that had been isolated earlier. Presumably, <p(Ti) fills T 2 in a 
pseudo-random way, so that its points are representative of what all of T 2 is doing. 

1. Past/future reliability: not a restatement of the second law 

The feeling that the past has occurred, is known and unchangeable, while the future is 
open, is a psychological arrow of time. Here I distinguish this arrow from the notion of 
reliability. The relative accuracy of prediction and retrodiction is separate issue. 

In 1962, Gold (quoting Levinger) gave the example of a Russian satellite. Here prediction 
is more effective than retrodiction. A satellite in orbit allows accurate prediction of its free 
trajectory. However, it is difficult to deduce where it was launched. (In those Cold War 
days, such information was often secret.) 

Consider a roulette wheel. Saying that the system is difficult to predict means that the 
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set y?(r 2 ) has significant weight in more than one macrostate. To state this precisely, let 
be the phase space grains at time T 3 {k runs over an index set). Uncertainty then means that 
Pr(A fc |<y9(r2)) = /i(A fc fl 0(r 2 ))//x(r 2 ) is well away from zero for several k. For a roulette 
wheel, from the initial condition, you cannot tell into which trap the ball will fall. For a 
symmetric example, imagine a slightly nonconventional roulette wheel. Prior to time Ti the 
ball is at rest in one of the traps. At T\ the wheel is spun vigorously. It leaves the trap 
it is in, bounces around, and then (at T 3 ) is caught by another trap. Now you look at the 
time T 2 position, with Ti < T 2 < T 3 . You don't know where it came from and you don't 
know where it is going. Call the Ti-coarse grains 5k- With this notation, your various time 
Ti hypotheses would be realized with probability proportional to \x [T 2 fl 0(<5fc)] , and your 
uncertainty would correspond to this being relatively large for several values of k. 

Why then when we remember something does it seem more certain than when we predict? 
This is because we are not retrodicting from the system itself, but from an auxiliary system, 
our brain, on which the earlier state of the system caused modification of neural coarse 
grains, i.e., wrote to memory. The fact that coarse grains in the brain have relatively few 
precursors is a property you want for any recording system. 

V. ARROWS OF TIME 

• The thermodynamic arrow. Entropy increase, heat cannot be converted to work, rules 
for predicting and retrodicting (with caveats, modifiers, etc.) 

• The biological arrow. 

— Digestion. Presumably follows the thermodynamic arrow 

— The past is accomplished, done. The future is changeable. This is a tougher nut, 
if you want to attribute it to the thermodynamic arrow. I have tried to narrow 
the issues by considering the following notion: "A computer's arrow of time" 
[24]. Imagine a weather-predicting computer. It has long term memory, records 
of external weather as well as its own calculations. It has short term memory, 
it's own current calculations, perhaps non-archived new data. (On a PC this 
could be the hard drive-RAM distinction.) It may have sensory organs, direct 
data feeds. It may be able to seek more data, sending up a weather balloon or 
instructing a human to do so. It may also be able to act, ordering the seeding of 
clouds, based perhaps on goals that have been programmed into the machine. It 
certainly deals with the past and future in different ways, but one would nevertheless 
ascribe all these asymmetric phenomena to the thermodynamic arrow. 

Which leads me to recall the last bastion of the biological arrow: 

— Consciousness. Who knows? This concept is the playground of neurophysiolo- 
gists, philosophers and the occasional brave physicist [25]. But not (yet?) me. 
The philosophers use notions that I haven't been able to figure out, for example, 
qualia. Perhaps you'll do better. Here's Webster: quale: a sense- datum or feel- 
ing having its own particular quality without meaning or external reference <a 
quale. . . such as, say, a certain shade of purple occurring in the presentation of 
a certain piece of cloth in a particular light — C.G.Hempel.> 

— Remark: Despite dissipation, the world has gotten more interesting in the last 
4 Gyr. It's because the Earth is an open system, the beneficiary of an enormous 



L S Schulman 



Time-related issues . 



SPhT & Technion lectures 



30 



negative entropy input. Photons of 6000K arrive from the sun, and assuming a 
rough energy balance, 20 of them leave at 300K. So the sun's gift to us is not so 
much energy (which, if it does accumulate, can cause problems), as negentropy. 
The same is true for computers, some of whose power goes into a fan, whose job 
it is to remove energy from the overheated chip. I believe the first to appreciate 
this was Schrodinger [26]. 

• The CP arrow. Given the observed CP violation, CPT implies T asymmetry. So the 
laws of physics are presumably fundamentally time asymmetric. 

I don't believe this asymmetry is relevant to the thermodynamic arrow. Having dif- 
ferent dynamics in one direction and the other should not make a difference (at least 
the kind of difference we see in the Second Law) provided the laws induce equilibrium 
in roughly the same way (and are not intrinsically dissipative). 

In Sec. VII B, when more tools for assessing the impact of asymmetry are developed, 
I'll give a cat map-based example that supports this opinion. I also remark that in 
the early 1970's there were some attempts to relate the thermodynamic arrow to the 
CP arrow, but none as far as I know met with success. 

• The cosmological arrow. The universe is expanding. 

• The radiative arrow. Variously defined: one uses outgoing wave boundary conditions 
for electromagnetic radiation; retarded Green's functions are used for ordinary calcula- 
tions; radiation reaction has a certain sign; more radiation escapes to the cosmos than 
comes in. Because the theory is time-symmetric these phenomena have reverse-time 
descriptions also, so the arrow is not precise. Let me single out two candidates. 

— Radiation reaction. It costs you energy to shake a charged particle. I would relate 
this to the thermodynamic arrow. It also costs you energy to stir your coffee, but 
only if the coffee began at rest. 

- Escape of radiation to the cosmos. Olber's paradox suggests this is a consequence 
of the cosmological arrow. 



VI. CORRELATING ARROWS OF TIME 

Around 1960 Thomas Gold argued that the thermodynamic arrow is a consequence of the 
cosmological arrow. Your coffee cools because the galaxies recede. He said that it's because 
expansion allows the universe at large to serve as a "sink" in the thermodynamic sense. He 
argued that in this way the global could impose an arrow on the local. In the following 
argument he assumes that the radiative arrow (radiation leaves us, "never" to return) is a 
consequence of the cosmological arrow and shows by example how that can impose a local 
arrow. Here's the argument in Gold's words [27]: 

Let us take, for example, a star, and suppose we could put it inside an in- 
sulating box. . . . when the star has been in the box for long enough (which in 
this case will perhaps be rather long), time's arrow will have vanished. . . . now 
if we were to open for a moment a small window in our box, then what would 
happen? Time's arrow would again be defined inside the box for some time, 
until the statistical equilibrium had been reestablished. But what had happened 
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FIG. 11: Illustration for Gold's star-in-a-box parable. Five snapshots. 

when we opened the hole? Some radiation had, no doubt, escaped from the box 
and the amount of radiation that found its way into the box from the outside 
was incomparably smaller. 

The escape of radiation away from the system is, in fact, characteristic of 
the type of "influence" which is exerted from outside. . . . The thermodynamic 
approach would be to explain that . . . free energy can only be generated from 
the heat sources in the world by means of heat engines working between a source 
and a sink. There may be a variety of sources, but the sink is always eventually 
the depth of space, . . . 

It is this facility of the universe to soak up any amount of radiation that . . . 
enables it to define the arrow of time in any system that is in contact with this 
sink. 

In Fig. 11 I have illustrated five stages in this story. The star, which has some base 
value of entropy, is initially not much affected by the confinement. As the star ages, it will 
tend to equilibrate within the box. I won't worry about details, like finding a box that 
confines neutrinos as well as everything else, and will assume that eventually the box is 
filled uniformly by very hot gases, including photons. As with all equilibration processes, 
the entropy will rise in the many-billion-year course of this process. Then the window is 
opened and photons escape, rather than enter. The entropy in the box decreases during 
the short interval when the window is open, but once closed equilibration sets in again and 
an entropy maximum is again reached. This entropic history is shown, slightly mismatched 
horizontally, in Fig. 12, with a better view of the entropy alone in the upper portion of 
Fig. 13. The time units for the figure take as the start of the story, 100 as the time the 
window is opened, 105 when it is closed. 

The absolutely essential point, on which Gold basis his conclusion, is that during the 
recovery period — after time 105 — the entropy climbs, presumably to a bit below its previous 
equilibrium value. Note that for t > 105 entropy is increasing, whereas for times less 
than 100 it was constant (except of course for fluctuations, also schematically indicated). 
In this way, the universe, and in particular the outgoing wave boundary condition, which 
determined that the time 100 to 105 flux was outward, has imposed its arrow on the local 
region, i.e., within the box. Our coffee cools because we are contact with the universe, and 
it is expanding. 

Does this show how we inherit the thermodynamic arrow from the cosmological arrow? 
Unfortunately, not. Suppose that when you opened the box radiation entered rather than 
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FIG. 12: Star-in-a-box story with schematic view of entropy dependence. 
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FIG. 13: Possible time dependence of entropy for the star-in-a-box. Upper graph: as expected 
from the star-in-a-box parable. Initially entropy increases as the system approaches equilibrium. 
At time 100 (in the figure units) the window is opened and the entropy drops. After closing the 
window (at time 105), equilibrium is restored. Only the entropy inside the box is shown. Lower 
graph: Same as before, except that radiation is assumed to enter during the time interval [100, 105]. 

left. Just as would occur for departing radiation, the system will be thrown out of equilib- 
rium. The final entropy might be larger (see Fig. 13), but the essential feature, the fact that 
the outside influence reestablished an arrow of time in the box, would still hold. 

This counterexample does not contradict the importance of outside factors in imposing a 
local thermodynamic arrow, but it does show that the arrow is not necessarily a consequence 
of the outgoing property. 

It is instructive to ask, where did the arrow that we "derived" actually come from? Why 
did we naturally accept that the behavior should resemble that in Fig. 13? Why should the 
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return to equilibrium follow the opening of the window? The answer is that the arrow of 
time in Gold's argument is the arrow of time of the narrator. In telling the story it is natural 
for us to assume that the response of the system follows, in our time sense, the opening of 
the window. This is the same assumption that is involved in the use of initial conditions in 
the posing of macroscopic, dissipative problems. 

This is an example of how, in discussing the arrow of time, ordinary language presents us 
with traps. In the foregoing argument the "natural" use of initial conditions is what gave 
the arrow. As discussed earlier, this use of macroscopic initial conditions is another way of 
formulating the thermodynamic arrow of time. So in the end the argument is circular. 

What is the alternative? If initial conditions prejudice the answer, don't use them. If the 
claim is that the thermodynamic arrow is a consequence of the expansion of the universe, 
then one way to handle the problem would be to give partial boundary conditions at two 
different times. At one time the universe is in a contracted state, at the other, it is much 
larger. Then you would try to give the same kind of information at those two times, so 
that it would only be the expansion that could give rise to the arrow. Even better, since it 
might not be clear what "same kind of information" means, you might consider an oscillating 
universe and then give the same kind of information for times a bit after the big bang and a 
bit (the same bit) before the big crunch. For example, you might propose local equilibrium 
5 minutes into history and 5 minutes before the end of history. 

VII. THE UNIVERSE WITH TWO-TIME BOUNDARY CONDITIONS 

For the sake of the logic, and not necessarily reflecting our actual cosmology, we assume a 
roughly time-symmetric universe, with expansion and contraction. In our expanding phase 
there was an important transition: In early times short-range forces dominated and the 
universe was homogenous (evident in the cosmic background radiation distribution). This 
was the highest entropy state, the most likely macroscopic state of the universe, given the 
forces governing the motion of matter at the time. 

On the other hand, with rapid expansion, the matter separated into clumps, with distant 
clumps no longer interacting by short range forces, but rather by gravity. Gravity does 
not favor homogeneity. With time the local configuration separated into stars, galaxies, 
clusters of galaxies, etc. So what was the highest entropy state — homogeneity — becomes 
an extremely unlikely state. You'll notice I avoid assigning an entropy or using the word 
equilibrium, since with long range forces those concepts become problematic. Nevertheless, 
once gravity becomes the dominant force, the system finds itself in what would be an unlikely 
state had gravity been dominant before. If we wanted to model this in a non-gravitational 
context it would be reasonable to consider two-time boundary conditions with low entropy 
macroscopic states at each end. 

1. How the universe imposes its arrow. 

The foregoing argument is the decisive one, even if you don't want to talk about two 
time boundary conditions. Boltzmann, in trying to understand the arrow, postulated an 
enormous fluctuation, from which we're still recovering. In that way he could get to low 
entropy, after which a steady rise in entropy becomes a standard problem in statistical 
mechanics. The mechanism I just discussed — the change in the dominant force, brought 



L S Schulman 



Time-related issues . 



SPhT & Technion lectures 



34 



about by the universe's expansion, and the transition from a homogeneous state being likely 
to being unlikely — already explains how entropy could have become small relative to what 
is dynamically appropriate, without invoking a fluctuation. 

Once you have a low entropy state at a particular time in the universe's history, coupled 
with the expansion that gives rise to that low entropy state, you have a good case for the 
induction of the thermodynamic arrow by the cosmological one. As the universe continues 
to expand, things begin to clump together in galaxies and stars and densities become high 
enough for nuclear reactions to begin in the cores of stars. Such nuclear reactions represent 
the slow transition of the stars out of a metastable state (in the first approximation, all 
matter would like to be iron). These slow transitions keeps us alive and able to maintain 
our own metastable existence through the negentropy flux from the sun. 

In describing the formation of metastable states in the "future" of the change in the 
dominant forces, I am not falling prey to circularity. It would not be correct to say, "Aha, 
there is a low entropy state at some time, and the entropy should get bigger in both 
directions away from that minimum (as discussed earlier for the Boltzmann H-theorem)." 
The dynamics is not symmetric about this point in time, precisely because of the expansion. 
Remark: Another low entropy feature emerging from the dynamics, without having to invoke 
a fluctuation, is the fact that the material available near the big bang (or big crunch, if things 
are symmetric) is hydrogen, rather than heavier elements. Thus, once gravity clumps all the 
low mass nuclides, they can gain energy by combining to form heavier elements. 

So there are two pieces to the argument relating the thermodynamic arrow and the 
cosmological one. First expansion creates a state that is extremely unlikely under the forces 
dominant at a particular time. Second, given such an unlikely state the entropy increases 
in the direction of time that we generally call forward. For this second part, the two- 
time boundary value problem can help. If for the sake of logical consistency you assume a 
time-symmetric universe, and if you don't want to impose the arrow of time as a separate 
postulate, then it is natural to assume that the uniform state of matter also exists sufficiently 
close to the (future) collapsed state. To know what happens in between, you solve the two- 
time boundary value problem. 

A. Solving two-time boundary value problems 

A number of two-time boundary value problems can be solved. The physical behavior 
we're trying to model typically shows good equilibration properties, so that I've had my most 
informative results using mixing transformations like the cat map. (Harmonic oscillators, 
easy to solve, equilibrate poorly.) 

Using the concepts we developed earlier in connection with the study of irreversibility 
I now pose a two-time boundary value problem for the cat map. I'd like to think of this 
map as acting on a particles in an ideal gas, say N points in, as usual, I 2 . With the coarse 
graining used in Sec. IF, the two-time boundary condition with low entropy at both ends 
is posed by demanding that all the points in the gas begin in a single grain, and also find 
their way back into a specific grain at some particular later time, say T. 

In Fig. 14 I show four of the early time steps for a "gas" of 500 points, all of which 
begin in a single coarse grain. The images do not differ from what is shown in Fig. 4 in any 
perceptible way (except for the change in grain and in the initial location). Next (Fig. 15) 
I fast forward a bit showing times 6, 8, 14, and 17. Sometime after 6 you lose any vestige 
of recognizable pattern. (You'll see this soon in an entropy plot.) At time step 14 it's still 
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FIG. 14: Four early time steps (0, 1, 2, 4) for 500 points initially placed in a particular coarse 
grain. 



FIG. 15: Time steps 6, 8, 14, and 17. 

a big mess. But by 17 you can see something's happening. Continuing in Fig. 16 are times 
19, 20, 21, and 22. Now the peculiarity at 17 is explained. The points were on their way 
to meet a final boundary condition. As a minor technical remark, note that the lines on 
the return are perpendicular to those on the way out. They're (roughly) in the direction of 
the eigenvectors of the cat map, and the stretching for the map and its inverse run along 
different eigenvectors (perpendicular in this case since there are only two, and the matrix is 
Hermitian). 

Two time boundary conditions can be difficult to solve, but because this is an ideal gas, 
finding the solution I just displayed is easy. All I did was start with (a few more than) 50000 
points and keep the 500 of those that landed where I wanted them (after 22 time steps). 
Removing the others did not affect the motion of those I kept. 

The entropy plot for this sequence of images shows complete temporal symmetry. This 
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FIG. 16: Time steps 19, 20, 21, and 22. 
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Cat map. Number of points: 500. Coarse grain mesh: 10 
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FIG. 17: Entropy as a function of time for the images shown in Figs. 14-16. Compare this depen- 
dence to that without future conditioning, shown in Fig. 7. 



is shown in Fig. 17. Notice that up to statistical fluctuations it is completely symmetric in 
time. There is entropy decrease towards the end, but don't worry about the second law! 
Those who are free from initial condition prejudice are exempt from that law. 

A remarkable feature of Fig. 17 is that the initial growth of entropy as a function of 
time (S(t)) is indistinguishable from that without conditioning (cf. Fig. 7). I'll show this in 
a moment in a graph that superimposes S(t) for several different final conditioning times 
("T"). This means that even though the points in the initial state (t = 0) are very carefully 
picked, you cannot tell. Or you cannot tell until around time 17, when their otherwise 
hidden property begins to become manifest. I call this a cryptic constraint — the system is 
subject to some constraint (the final conditioning), but you cannot tell. Every point that 
has been selected is selected with an accuracy of about 2.618~ 22 ~ 6.4 x 10~ 10 (which is well 
within matlab's precision). This is because the larger eigenvalue for the cat map is ~ 2.618, 
and any error would be magnified by the 22 nd power of that amount. 

It is instructive to ask why this takes place. Here one can use ordinary statistical me- 
chanics reasoning but with a new twist. It's clear from S(t) that the system forgets its 
initial conditions in about 5 times steps; for this coarse graining that's the relaxation time. 
Another way to say this is that wherever the point started, it can be anywhere in I 2 within 
about 5 time steps. This means that the normal relaxation, despite the future conditioning, 
can be understood as a kind of forgetting of the final conditioning. Not only can the system 
get anywhere in 5 time steps, it can get to anywhere. So 5 steps before the end there is little 
indication of its future trajectory. 

The manifest symmetry of the graph with respect to the pattern of initial relaxation and 
final un-relaxation is due to the similarity of the dynamics in both directions, each of them 
dominated by the large eigenvalue of the cat map matrix. Early on, this symmetry led people 
to suggest that if Gold was right, sentient beings would always see an expanding universe. 
That is, the arrow of time in the contracting phase, that approaching lower entropy, would 
be reversed. 

I've recently provided further evidence of this by showing that a fundamental notion, 
macroscopic causality, can be derived. Often this is postulated as the basic arrow of time, 
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FIG. 18: Causality is an effect: Entropy for the double conditioned cat map. On the left there is no 
perturbation. The middle figure reproduces the graph on the left, but with the entropy dependence 
for an early time perturbation superimposed. On the right is again shown the unperturbed graph, 
this time with the entropy time dependence for a late-time perturbation. 

but in the present context, where the arrow itself is something to be derived, causality also 
is not fundamental. Of course it's not so easy to say what you mean by causality, and once 
you've said it, you've often not so much given a definition as launched a philosophical debate. 
(Note, I'm talking about macroscopic causality, not the commutation relation properties 
that enter in microscopic causality.) The way I chose to interpret this idea was to imagine a 
double conditioned dynamics in which at one particular time the dynamical rule is changed, 
but for which the boundary conditions remain the same. That is, you do T steps of cat 
map dynamics with a particular pair of boundary conditions. Then, for the same boundary 
conditions you do to steps of cat map dynamics, one step of something else (e.g., a different 
measure preserving matrix), and T — t — 1 steps of cat map dynamics. Then you compare 
the macroscopic behavior for the two. It turns out that if t comes during in interval of non- 
constant entropy, the change in macroscopic variables is always later in the time direction 
of entropy increase. This is illustrated in Fig. 18, with my standard laboratory technique, 
the cat map, although there is no problem showing this analytically as well [28]. (Nor is 
this difficult to understand intuitively, however surprising it may seem on first impression.) 

In Fig. 19 I show S(t) for a variety of conditioning times. As is evident, for times less than 
ten — twice the relaxation time — there is a perceptible difference in macroscopic behavior 
brought about by the conditioning. The system does not relax normally, in fact it does so 
more slowly than usual (note too that once T > 10, you really can't tell what T is). 

This result has physical implications. Determining the cosmology in which we dwell is a 
central scientific challenge. The idea of an oscillating universe is still on the table, despite 
the discovery of accelerated expansion. What the ideas inherent in Fig. 19 suggest is a 
conceptually independent way to get a handle on our cosmology. The first suggestion of this 
sort was by John Wheeler [29, 30]. He proposed that you go into your lab and check the 
decay of Sa 147 . If you are careful — and if the time until the big crunch is not much more 
than 300 Gyr — you'd see not only an exponential with decay rate of about 100 Gyr, but 
an exponential going the other way, altogether a hyperbolic cosine. Because of the future 
constraint (back to primordial hydrogen), the sample would be relaxing more slowly than 
you'd expect it to. The beauty of this idea is that it is a conceptually independent way to 
study cosmology, based as it is on statistical mechanics, and of course on the assumption 
that the arrow of time indeed follows the arrow of cosmological geometry. 
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Cat map with various conditioning times. 
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FIG. 19: Entropy for the macroscopically low entropy double conditioned cat map, with various 
conditioning times. 

Unfortunately there are technical difficulties with this idea. By looking at two time 
conditioning when a small piece is broken off from the main trend [31], I found that a 
laboratory sample would still decay exponentially, and to see the cosh you'd need to look 
at cosmological abundance. (Wheeler also retracted his idea [32].) Instead I proposed 
to look at other dynamical variables with extremely long relaxation times. I believe the 
best candidate is large scale structure, that is the dynamics of the galaxies themselves. 
Unfortunately it's not clear what relaxation means in this context and because of a lack of 
information concerning dark matter and dark energy, it would be difficult to say how quickly 
something should decay. Nevertheless, there's a lot of work on the dynamics of large scale 
structure, including careful simulations of the motion of many particles (galaxies), including 
the effects of dark matter, etc. Whether at this point there is any variable whose behavior is 
well enough understood to say that deviations from observation can be attributed to future 
conditioning, I doubt. What I believe is worth doing, is trying to find a variable that on 
the one hand would be sensitive to distant future conditioning, but on the other would be 
relatively insensitive to the major unknowns that still plague cosmology. Again, I do not 
know if it's overly optimistic to expect to find anything. 

B. Symmetric behavior of a non-time-reversal invariant dynamical map 

In Sec. V we discussed the arrow of time associated with CP violation. At that point 
I expressed the opinion that I did not believe that arrow had anything to do with our 
thermodynamic arrow. Besides the smallness of the effect, the asymmetry has no apparent 
connection to dissipation. There is a different rule for going forward and back, but that in 
itself does not make your coffee cool. 

I will next describe a variant cat map dynamics that does not have time reversal symme- 
try. Using this asymmetric dynamical law I have done much the same thing that was just 
done for the cat map itself: two time conditioning with symmetric boundary conditions. 
The result is shown in Fig. 20. As you can see, despite the non-invariance, this map is as 
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symmetric or asymmetric as a mapping conventionally possessing time reversal invariance. 

An asymmetric cat map To do define an asymmetric cat map, you have to know what 
it means for a dynamical law to be time reversal invariant. The way I like to state this is to 
have only forward time propagation [33]: You have a space fl (for the cat map Q = I 2 ), a time 
evolution operator (for some particular time interval [34]), and a putative time-reversal 
operator, T (for which T 2 = 1 on Q). What you need for time reversal invariance is that if 
you start with any state, ui G Q, propagate it forward with 0, then apply T, then propagate 
with and finally hit it again with T, you come back to uj. In symbols: T0T0 = 1, which, 
for invertible is equivalent to the more familiar T0T = _1 . For classical mechanics Vt 
is phase space and T the reversal of velocities. In quantum mechanics there is the famous 
anti- unitary operator involving complex conjugation. For the cat map you can find 2-by-2 
matrices that do the job, coupled with the mod 1 operation. For example, 



satisfies MTMT = 1, with M the cat-map matrix (Eq. (25)). To produce a time evolution 
operator that is not time-reversal invariant, we introduce an operator, C ("Crunch"), 



again acting modulo 1, with ael Like M, C is measure preserving and 1-to-l, but it has 
the advantage that it has a continuously variable (non-integer) parameter, a. 

For we take the product [Crunch] o [Cat map], with the mod 1 operation performed 
after each step. (Similarly when T is applied, one takes mod 1 before and after.) If you 
ignored the mod 1 operation, you would find that as matrices, not only is T 2 = 1, but 
0*T*0*T = 1, as well. (This works for replaced by the cat map alone, which is why 
the cat map is time-reversal invariant. However, for non-integer a, the product of these 
operations is not the identity, and is not a time reversal invariant operation (I have not 
proved that there is no non-linear transformation that does the job, so this demonstration 
is not complete). 

Using this 0, we would like to know whether, on its own, it provides an arrow of time. One 
way to check is to see whether with two-time boundary conditions it can undo equilibrium 
just as the time-reversal invariant cat map does. That it has the same symmetry properties 
is shown in Fig. 20. 

Two-time boundary conditions: Counting states in a cubic centimeter of air 

How restrictive is the demand for low entropy two-time boundary conditions? Can all the 
gas in a room re-collect in a little corner? In classical mechanics if a system is sufficiently 
chaotic you can get it from anywhere to anywhere (in phase space) on time scales longer 
than its relaxation time (where that time can be defined with respect to the level of detail 
with which you have defined "anywhere") 

Quantum mechanics however imposes a scale for the fine graining of phase space, namely 
2irh. In a 2n-dimensional phase space each region of volume (27rh) n gives rise to an inde- 
pendent Hilbert space direction (or basis vector). So for quantum mechanics it's not enough 
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FIG. 20: Time symmetric behavior with two-time boundary conditions. On the left is the entropy 
for the usual doubly conditioned cat map. On the right the map that is used is the <f> of Sec. V, 
which is not invariant under time reversal, although it is non-dissipative and measure preserving. 



to find a small region that gets you where you want to go — that region requires a minimum 
volume before you can satisfy the boundary conditions. 

Using the quantum scale I estimate the number of microstates available to the atoms in 
a cubic centimeter of air. Actually, I'll estimate a smaller number (by a few million powers 
of ten), the states associated with a monatomic ideal gas of atomic weight 30, at room 
temperature and at atmospheric pressure. Let N\ be the number of particles in the given 
volume. Let p be a characteristic momentum scale for the particles. 
Exercise: Show that N ± ~ 2.42 x 10 19 . 

Solution: Use PV = NkT with V = 10~ 6 m 3 , T = 300 K, P =1 atm. 

To estimate the number of microstates we can use standard formulas for the entropy of 
an ideal gas, taking into account particle identity. From Baierlein [35], p. 103-104, Eq. 5.41: 



(V/N) 



,5/2 



S = kNln 
Letting M be the number of states, 

Af = exp(S/k) = exp I iVlog 



with A th = 



h 



V2irmkT ' 



(V/N) 



,5/2 



In MKS units: A th = 1.8403 x 10~ n m, V/NA.1667 x 10~ 26 m 3 , and 

N = io 1020 ' 2784 

Now consider various initial and boundary value problems. 

Suppose we placed the N particles in a box 0.25 cm on a side, rather than 1 cm. 
ratio the number of states in the 1/4 cm case ("q", quarter) to that in the 1 cm case 
one) is (from Eq. (89) 

K. 



^ = exp(iV(logU 9 -logU )) 



(89) 



(90) 



The 

"o", 



(91) 
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This will be a tremendous reduction of phase space numbers, namely a factor of iQ( l ° 19 ' ie ') . 
Comparing however to Eq. (90), this is seen to be a minuscule effect, and a number much 
larger than 1 remains (much, MUCH larger). 

Now another scenario: imagine an initial condition in which the particles were in a region 
0.5 cm on a side. This cuts the phase space by a factor ("h" for half) 

^ = exp(7V(l g^-logK)) (92) 

And now finally we demand a particular final condition: that the particles also find them- 
selves in a region 1/2 cm on a side, say 10 minutes later (so this is a two-time boundary 
condition, requiring confinement in a 1/2 cm box twice). Dynamically there should be clas- 
sical states that get there, but what we want to check now is whether there are enough to 
leave a phase space volume containing more than one quantum state. But we already know 
how much this will cut down phase space: it's the factor just given in Eq. (92). 

To evaluate this, I'll stop playing with numbers and observe that the ratio in Eq. (91) is 
just exp(— 3N log 4), while that in Eq. (92) is exp(— 3Alog2). Squaring the second number 
gives the first! In other words, the cost of the final condition is no worse that a slightly 
more stringent initial condition. At the macroscopic scale, your pump would have to work a 
little harder. In this sense, two-time boundary value problems have plenty of solutions. One 
might still ask why we never see this happen spontaneously. The answer is that although 
the entropy decrease is not much, it's still far too large for a spontaneous fluctuation. The 
odds of such a thing are given by the ratio in Eq. (92). 



VIII. HOW DO YOU DEFINE MACROSCOPIC? . . . WITH PRACTICAL 

CONSEQUENCES 

This work is motivated by two problems. 

• The second law of thermodynamics: Heat cannot be converted to work, entropy 
increases (with appropriate delimiters). But the work/heat distinction, or the 
micro/macro distinction, depend on coarse grains (or their equivalent). What defines 
coarse grains? 

• Nonequilibrium statistical mechanics, and in particular phase transitions 

— For example, metastability is poorly described by equilibrium statistical mechan- 
ics, because of the central role played by the (inappropriate for metastability) 
thermodynamic limit. 

When I first began working on this question a friend chastised me: this is really a waste 
of time. (I think the "really" meant to suggest that this was a waste of time even by my 
usual standards. And this is a friend.) 

It turns out however that the issue of defining coarse grains is part of a general problem, 
both philosophical and practical. One related and currently fashionable issue is the finding 
of community structure in networks. As discussed in [36], this is also related to how the 
mind forms conceptual categories, "words" (in a generalized sense, since mice and worms 
also form such categories (perhaps only on the time scale of the species)). 
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A. The role of time 

It is my thesis that the primary criterion for the definition of coarse grains is time. In 
studying non-equilibrium statistical mechanics, B. Gaveau and I developed the impression 
that certain slowly changing quantities that emerged in our formalism played the role of 
observables. This led us to ways to make the definition of coarse grains precise. But before 
going into this, I want to mention that although our views were formed by the problems we 
were working on, the perspective, the relation of macroscopic to (relatively) slow, was far 
from original with us. Here's a quote from the introductory pages of Landau and Lifshitz, 
Statistical Mechanics [37], 

. . . Owing to the comparative slowness of chemical reactions, equilibrium 
as regards the motion of the molecules will be reached . . . more rapidly than 
equilibrium as regards . . . the composition of the mixture. This enables us to 
regard the partial equilibria ... as equilibria at a given . . . chemical composition. 

The existence of partial equilibria leads to the concept of macroscopic states 
of a system. Whereas a mechanical microscopic description . . . specifies the 
coordinates and momenta of every particle, ... a macroscopic description is 
one which specifies the mean values of the physical quantities determining a 
particular partial equilibrium, . . . 



B. The framework: stochastic dynamics 

But before implementing grain definition, I need to explain the formalism. I will also 
take the opportunity to talk about phase transitions, which will be a way of confirming that 
our "observables" are related to the usual notions of emergent macroscopic concepts. 

I introduce basic notions of stochastic dynamics. 

The dynamics takes place on a state space X of cardinality iV < oo. The motion is a 
Markov process, £(t), in X with transition matrix R: 



R 



xy 



- Pr(a; ^ y y = Pr ^( f + i) = x | ft) = y) . (93) 



Thus if p(x, t) is the probability distribution on X at time t, the distribution at t + 1 is 
p(x,t + 1) = R xy p(y,t). The matrix R is assumed to be irreducible, which implies that 
it has a unique eigenvalue 1 and a strictly positive eigenvector p Q : 

^2R xy = l and ^ R xyPo(y) = Po( x ) > ^ x ■ ( 94 ) 

By the Perron- Frobenius theorem [38, 39], all eigenvalues of R lie on or inside the unit 
circle in the complex plane. These eigenvalues A& are ordered by decreasing modulus and 

increasing phase, if applicable: A = 1 > |Ai| > |A 2 | > Because R need not be symmetric 

(and generally won't be) there are distinct right and left eigenvectors: pu and Ak 

Rp k = ^kPk, A k R = \ k A k , fc = 0,l,.... (95) 

Although R may not be diagonalizable (and may require a Jordan form), we assume that 
for the eigenvalues that concern us (those near 1) a spectral decomposition can be used. 
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The eigenvectors are orthonormal in the following sense: 

(A k \p e ) = hi 

which leaves undetermined a multiplicative factor for each k. The stationary state p is 
naturally normalized by J2Po( x ) — 1> so that A (x) = 1 Vi 6 I. This leads to several 
normalization options: 

• An "L 1 " perspective: = 1 > 

• Same as for A : max x = 1 

• An "L 2 " perspective, motivated by the fact that when detailed balance holds, A k {x) = 
Pk( x )/Po{ x )- (Detailed balance means R xy Po(y) = R y xPo( x )-) Then the following 
normalization is natural: J2 x Po( x )Ak( x ) 2 = 1- 

What will be seen is that the slow (lowest k) left eigenvectors of R are the macroscopic 
observables. Our take on phase transitions highlights this. 

C. Phase transitions 

An idea that goes back to the Onsager solution of the 2-D Ising model is that phase tran- 
sitions result from eigenvalue degeneracy (for this case the degeneracy is in the eigenvalues 
of the "transfer matrix" [40]). This was especially advocated by Marc Kac, with a variety 
of examples. The problem with this dream was that not every phase transition came with 
a linear operator. 

"R" provides such an operator with great generality. 

We now consider a situation in which there is eigenvalue degeneracy [41, 42]. Assume Ai 
is real and let there be a range of times t such that 

1 - X\ = e< 1 and \\\\ < 1. 
A basic relation 

The probability distribution at time t, given that a particle was at y at time 0, is 

P l( X ,t) = Y J Kjuy = R t x y 
u 

Since A\ is orthogonal to the strictly positive po, we deduce that Aim = m&x x Ai(x) > 
and Ai m = niina; A\(x) < 0. Furthermore, since \X\ < oo, these extreme values are assumed 
on points of the space, xm and x m (which may not be unique). The eigenvalue relation for 
A 1 implies 

X\A 1 (x M ) = J2M x )p t x J x )- (96) 

X 

Divide this by A 1M and subtract it from 1 = Yli X Py( x )- ^° ^ ne same f° r A Vm and x m . This 
yields 
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These relations provide a quantitative measure for something I noticed long ago about 
stochastic theories of metastable states: the next to slowest (A closest to 1) left eigenvector 
is approximately constant on the phases (including the metastable one). 

Eq. (97) explains this feature. Within the sums, all summands are positive. Therefore 
if the left hand side is small (1 — A* 1) then either p t XM is small or [1 — A 1 (x)/A 1M ] is. 
This implies that if it the system is likely to go from xm to x in the time t, then A\{x) 
is close to Aim- The same observations hold for A im as well. That is, points that are 
close dynamically have similar values of A\. Note that the relation between dynamical 
proximity and near values of A is not symmetric. The A\ values are close if you can go easily 
from xm to x, but sometimes you can go from x to x M but not the other way: in that case, 
the A values may differ significantly. 

This demonstration did not use the smallness of A 2 . Thus for any eigenvalue A and for 
any time t, points that are close dynamically on that time scale (i.e., they can reach each 
other) will have values of the associated A that are close to each other, where "close" means 
that the distance is controlled by 1 — |A|*. 

I'll return to this point later when defining coarse grains. 

It is natural to define the phase in terms of the "observable" A±, which can be considered 
the name of the phase. For a number a define 



I M (a) = < x G X 



I m (a) = { xeX 



A 1M - Ai(x) 

Aim 
Aim - AAx) 



< a 



< a 



For small enough a these sets will be disjoint. They are the phases. 



D. Many phases, extension of the basic relation 

I use the "max" norm for the A's. 

The degeneracy assumption is extended: for some integer m, Ai, A2, . . . , A m are real and 
close to Ao = 1. That is, there exists a range of t such that for some e < 1 

1-A* fe = 0(e), l<k<m. (98) 

In much of our development we further assume that the |A m+ i| is much smaller than A m , 
that is, 

|AUil = Ofa)«l. (99) 

As above, for x, y G X, define 

p t y {x)=R t xy . (100) 

Py(x) is the probability that a system in state y at time is in a; at time t. 

Consider the following geometric construction in IR m : for any y G X, form the vector 
A(y) = (A^y), . . . , A m (y)) G R m . This gives a set A of N vectors in R m . 

Call this the observable representation of X. 

Let A be the convex hull of A. 

By the orthogonality relation, the vector = (0, 0, . . . , 0) is in A. 



J2po(y)A(y) = 0, 



(101) 
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so that is a convex combination of the vectors of A. As a consequence, one can find m + 1 
points y|, 1 < £ < m + 1, such that the vectors 

Ei = A(y* t ) (102) 

are extremal points of A, and such that is a convex combination of them. There may be 
several ways to choose the {y}}, but in fact, the {Ee}, 1 < £ < m + 1, are uniquely defined 
up to a small ambiguity. By the selection of the Eg, we can find [/,£, 1 < £ < m + 1, with 
< \it < 1, Y17=\ f^e — ^ sucn ^at 

Zfi e E e = 0. (103) 

For technical reasons we make the following separation hypothesis: 
Hypothesis S: For each £ let 



and define 



<j) £ = min ||^-^||, (104) 

k=l,...,m, k^l 



$ = min jr^- . (105) 



Then the hypothesis is that the extrema y\ can be selected so that (in addition to Eq. (103)) 
they satisfy the following: 

1-A^ = e«$<0(l). (106) 

We do not know at this point whether this hypothesis can be dispensed with or whether it 
is possible to nicely categorize the processes for which it holds. So far we have been able to 
show it to be superfluous in all cases that we studied in detail. 
We next observe that by definition 

XlA k (y}) = (A k \pl t ) , (107) 

so that for all £ 

(\{A 1 ( y ;),)iA 2 te),...,)S n A m to)) = J2ph(y) A (y) (ios) 

y ex 

Because < < 1, k < m, the vector on the left side of Eq. (108) is in the convex set A. 
Using YliyPljiiv) = 1> we have from Eq. (108) 

E e - {XlA^^lA^y*),...,^^)) 
y 

This relation is an m-dimensional version of our fundamental relation Eq. (97). Like that 
equation, the left hand side is small, with distances less than 1 — \ l m . Unlike Eq. (97), 
however, the right hand side of Eq. (109) is not manifestly a product of positive quantities. It 
is possible to overcome this by going to each extremal point and defining a coordinate system 
that forms a cone coming out of that extremal. In that coordinate system all quantities in 
Eq. (109) are positive and one can deduce the constancy of the A's on each phase. 
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FIG. 21: Plot using the first two left eigenvectors (A\ and A 2 ) of the transition matrix, R, for a 
three-phase system. A circle is placed at each point (Ai(x), A 2 (x)) for each of the N states, x, in 
X. The lines connecting the circles are for visualization. The matrix R is generated by combining 
4 blocks, 3 of which are random matrices, the fourth essentially zero. Then a bit of noise is 
added throughout, with bigger terms for migration out of the fourth block. Finally the diagonal 
is adjusted to make the matrix stochastic. This leads to a pair of eigenvalues near one. This 
plot using the first two eigenvectors shows the extremal points to be clustering in three regions, 
corresponding to the phases. The points not at the extremals represent the fourth block, all of 
which head toward one or another phase under the dynamics. For the particular matrix chosen, 
they are about as likely to end in one phase as another. See [42] for details. 

Instead of going through the details, I present an intuitive argument for the reasonableness 
of the conclusion that the A k 's (k < m) are effectively constant on the phases, and can 
therefore, collectively, serve as the name of the phase. 

The implication of having the A's constant on the phases is that all the points from a 
single phase gather into one small region of R m . That this happens can be seen (for m — 2) 
in Fig. 21. This is a plot of Ai(x) versus A 2 (x). The vertices of the triangle that you see 
are actually composed of many points (shown in more detail in Fig. 22). 

Here is a way to understand this bunching. The phases are in a sense dynamically far from 
one another. If you start in one phase you expect to stay there for a long while before going 
to any other phase. This means that there is a restricted dynamics within that phase that 
nearly conserves probability. That is, the restricted transition matrix satisfies ^ x R X y ~ 1) 
where the sum over x is for x only within the phase and y is within the phase as well (which 
implies that the constant vector on the phase is an approximate left eigenvector of eigenvalue 
near 1). The bunching of points in one phase (as we define it) means that for all points in 
that phase Ak(x) has very nearly the same value (for every k — 1, . . . , m). Let's see why 
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Two left eigenvectors: Detail 
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FIG. 22: Detail of the upper left vertex in Fig. 21. In actuality the points in each phase cluster 
closely together and more than one extremal might, in principle, occur. The precision is limited 
by the non-negligible magnitudes of the quantities 1 — A2 and A3. 

that happens. Consider the eigenvalue equation for A k applied t times, where t is small 
enough so \\ is still close to one, so close that we will now treat it as unity. Then 

A k {y)^Y. A ^)Rly (HO) 

X 

Now suppose y is reachable with significant probability from x in time t (so we would say 
that x and y are in the same phase). Then Eq. (110) can be thought of as an equation 
within the restricted dynamics. What it says then is that every one of the left eigenvectors 
Ah, 1 < k < m, provides an eigenvalue 1 for the restricted dynamics (up to a small escape 
rate for transitions between phases). But this left eigenvector is unique and we already have 
a candidate for it, namely a constant on the phase. It follows that each A k (1 < k < m) is 
proportional to a constant — on each particular phase. 

Several other results hold that I will not prove here, but which I will state in the following 
subsections. 

E. Concentration of the probabilities on the phases 

As for the 2-phase case, all points in a phase cluster around the appropriate extremal. In 
both cases, almost all the weight of the stationary measure (p ) is in the phases. Here it is 
important that |A m+ i| be small. 
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F. Basins of attraction 

Our phases include, besides the usual states in a phase, part of the 0(77) basins of attrac- 
tion for those phases. A particular example occurs when the extremal point itself is not in 
what one usually calls the phase but gets there in 0(77). In this respect the extremal is like 
the points that are not uniquely identified with any phase and have non-zero probabilities for 
going to several. The difference between these intermediate points and a not-in-the-phase 
extremal is that the barycentric coordinates (see below) for an extremal have a single 1, 
with the other entries zero. 



G. Probability to arrive in a given phase — barycentric coordinates 

The points not in the phases lie within the simplex defined by the extremals. Their 
positions can therefore be written in barycentric coordinates with respect to the simplex. 
This is a collection of positive numbers adding to 1. These numbers turn out to be the 
probabilities of starting from a particular point and ending at the corresponding extremal. 

This can be seen by observing that for any y (not necessarily in a phase) 

KMv) = J2 A ^)K y = J2 a ^)pI(x) . (in) 

X X 

Now assume that the A's are essentially 1 and group the i-sums into phases, making use of 
the facts that little of the probability falls outside the phases and that the A's are constant 
on the phases 



Mri) ■ (112) 



Mv) = E E p» 

i [xexW 

The quantity inside the large square brackets is the total probability to go from y to any 
point in the phase Call this qe. Eq. (112) holds for each k — 1, . . . , m and we collect 

these into a vector equation 

A(y) = Y d QtE l . (113) 

1 

This shows that the probability to end in a specific phase is the barycentric coordinate of 
the associated vector in the observable representation. 

An application of this result is a random walk on the landscape shown in Fig. 23. The 
stationary state is shown in Fig. 24. There are 4 regions of attraction, which we identify 
as the "phases" . The spectrum of the (225 by 225) generator of the stochastic dynamics is 
[0, exp(— 16.0), exp(— 15.3), exp(— 14.8), exp(+1.2), . . . ], so that it satisfies the conditions for 
having 4 well-demarcated phases, which in this case represent regions of attraction. Finally 
in Fig. 25 we show how our methods can be used to calculated the probability that from a 
given initial condition one will arrive at one or another asymptotic state. This figure is the 
observable representation for this system and forms a tetrahedron (cf. Fig. 26, which was 
produced in a different way, but is a variation on the theme in that I only display the convex 
hull). Thus each circle in the graph represents a point on the 15 by 15 lattice and its location 
in the plot, when expressed in barycentric coordinates with respect to the extremals, gives 
its probability of reaching a particular phase. In the graph we do not identify the particular 
circles, but the same computer program that generated the graph can easily provide a table 
of probabilities for each initial condition. 
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FIG. 24: Stationary distribution for the walk on the landscape shown in Fig. 23. 

H. Hierarchical metastable states 

A variation of the "observable representation" which plots the vectors A(x) = 
(Ai(x), . . . , A m (x)), is to plot = (Ai(x)X\, . . . , A m (x)A^). This allows a dynamical 

image to be viewed. It is particularly informative in the situation where there is a hierar- 
chical relation among the metastable phases (as is supposed to occur in spin glasses). As a 
function of time phases having the same ancestry merge, finally, as t — > oo coming together 
in the stationary state. (For some of these applications the eigenvalues drop gradually, but 
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FIG. 25: Observable representation, in IR 3 , of the states for the walk on the landscape shown in 
Fig. 23. Each circle represents a point on the 15 by 15 lattice and its position within the tetrahedron 
(when expressed in barycentric coordinates with respect to the extremals) gives the probability of 
starting at that point and arriving at one or another extremal. 




FIG. 26: Convex hull of the set of points A{y) for y 6 X. This is for a case of 4 phases and the 
figure formed in M 3 is a tetrahedron. 

this can still be useful.) 

Here is a matrix explicitly possessing hierarchical structure: The overall W matrix has 
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FIG. 27: Phases at successively later times for a hierarchical stochastic matrix. As explained in 
the text this is a two-dimensional projection of the five dimensional plot of eigenvectors multiplied 
by eigenvalue to the power t. On the shortest time scale there are 6 metastable phases (circles); 
subsequently they merge into three and finally into a single stationary state. 

the following form 



and e < (5 < 1. (e and 5 are generic small matrices and are not all the same.) There are 
6 phases; on a medium time scale three pairs decay into a common branch, subsequent to 
which the three branches merge into a single trunk. Since we cannot image the 5-dimensional 
structure, we take the projection of this motion (as a function of time) on a particular plane. 
This is shown in Fig. 27, where the circles represent the original phases and the "x" is the 
final state, (0, 0). 



Our nonequilibrium formalism provides a framework within which to implement the phys- 
ical idea that time is related to "macroscopic." The first step is to define a metric on X 
based entirely on dynamics. 

Two points x,y G X are more like one another if their values of are closer; the larger 
|Afc|, the more important that closeness is. Define the following distance on X 




(114) 



IX. COARSE GRAINS 




(115) 
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FIG. 28: For grain diameter < a, the two routes from p(i) to q(r + 1) differ by at most a. 

The parameter T in Eq. (115) reflects temporal precision. 

A coarse graining is a disjoint partition of X. The grains will be called A or A& and 
satisfy UfcAfc = X and A& n A*/ = for k ^ k! . Points within a grain are designated x or 

x A e A c x. 

Coarse graining of probabilities is straightforward: 



p(A) 



E 

rrSA 



(116) 



This rule applies to row indices or kets. For column indices, or bras, there is a weighting 
(necessary to make the coarse grained stationary state a stationary state of the coarse grained 
evolution). For a grain A, and for y e A, let 

Po(y) 



( \ - Po{y) 

w A (y) = 



(117) 



Po(A) Ej/eAPoG/) ' 

where p is the stationary state of i?. Thus the observable A(x) is coarse grained to become 
A(A) = J2 yeA w A (y)A(y). 

With respect to the stochastic dynamics itself, an additional time-smearing is applied to 
the stochastic matrix R. For fixed T define 



xeA i/eA' 



(118) 



The "T" in i? T is not a power, but a reminder that the original R (on the right-hand side 
of Eq. (118)) has been taken to the power T, i.e., that the time has been coarse grained. It 

follows immediately that p is the eigenvalue-1 right eigenvector of R T . 

An important property for a coarse graining is that it be close to "commuting," namely 

R T p be close to R T p . In [36] we show that if V«, [d,T(x,y) < a, Vx,y G u], then \R T p — 

R T p | < a. This is illustrated in Fig. 28. 



Random walk: time recovers space 



Consider diffusion on a ring of sites, 1, . . . , N (with periodic boundary conditions). 
Define the matrix B as translation by one step counterclockwise (e.g., B 12 = 1). The 
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transition matrix is it! = al + (1 — a) (^B + B T ^j /2, with < a < 1 and T indicating matrix 
transpose. The eigenvalues and eigenvectors are 



A fc = a + (1 - a) cos0 fe , 4>k 



2nk 



, A; = 0,1, 



N 
2~ 



u k (x) = N 1/2 cos (x0 fc ) , v k (x) = N 1/2 sin(x(p k ) 

fc = l,2,..., ^ , x = l,2,...,iV 



where [u>] is the integer part of w. For even N, v k is zero for k — 0, iV/2, and for odd AT it 
vanishes for = 0. We use a variation of the distance function defined above: Let d(x, y) 
be the square root of 



k>l 1 



(119) 



Using t — 1, with not too much manipulation one obtains 

2 



di(a;,j/) 



V 2] sin 2 ((x- y)cf> k /2) 



N ^ 



(1 - a) N ^ sin 2 (0 fc /2) 



(120) 



For large iV this becomes 
d 2 (x,y) 



_l 2 r /2 sin 2 ((x-y)9) 

'1 - a) 7C J 



\x-y\ 



sin 2 6 

(x and y are integers). Thus the dynamical distance reconstructs configuration space. 



(121) 



A. Making grains 

Defining a distance function provides a basis for the selection of coarse grains. This is 
not the end of the story, since you need a cluster algorithm to partition the space. We used 
a Monte Carlo scheme, with an annealing protocol, that minimizes the sum of the internal 
distances within each grain. The results for various examples are now shown. 



1. Dynamic Ferromagnet 
The one-dimensional Ising model has the energy function 

N N 

H = -J^a k a k+1 - Bj^Vk , (122) 

k=l k=l 

where a k is a ±l-valued spin at site J > is the spin-spin coupling constant, and B 
the external magnetic field. Index addition in Eq. (122) is modulo N. Many stochastic- 
dynamical schemes are consistent with this energy and we use the following. A site, k', 
is randomly selected and the change in energy that would occur if that spin were flipped, 
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TABLE II: Magnetization of the states within a grain. Each row corresponds to a grain and lists the 
magnetization of each state in that grain. Note that within each grain the values of magnetization 
are almost all equal, demonstrating that with a dynamically defined distance the order parameter, 
magnetization, emerges naturally. 

High temperature, T = 2.25 Low temperature, T = 0.8 
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AE = H(ak> — > — av) — if (original), is calculated. The flip is then implemented with 
probability aexp(— AE/2T), where T is a temperature parameter and a a global constant 
chosen small enough so that the diagonal components of the transition matrix are non- 
negative. A second allowed process is a double flip by opposite pointing neighbor spins. The 
probability that this occurs is governed by an additional controllable parameter. In this 
model there is no breakdown in analyticity, but for T < 1 and B = the system will spend 
most of the time strongly polarized, (po as a function of magnetization is strongly peaked at 
symmetric nonzero values.) Above T = 2 the system is found mostly at small magnetization 
values. 

The transition matrix, R is 2 N x 2 N , but it is sparse. As expected, for T < 1 there is near 
degeneracy of 1 = A and Ai. (As T increases above 1, the gap, 1 — Ai, rapidly increases.) 

For this example we calculated a set of grains. What emerged was a sorting according to 
magnetization, both above and below the transition. The magnetization of the states within 
each grain is given in Table II. For the ferromagnet the dynamical distance has revealed an 
"emergent" quantity, the magnetization. 

2. Words ( organization of concepts ) and the brain 

For this example and the following one, the matrix R does not reflect a manifest dynamics, 
but takes some measure of the connectivity of a set and builds from it a stochastic matrix. 

The "states" for the following analysis are elementary units of the brain. They might 
be neurons, columns or some other structures. The unit can receive input from several 
sources and can discern the origin of each signal. One idea [43] of how the brain generates 
"words" is that this unit learns to recognize patterns of stimulation, which we will take 
to be temporal, sequential patterns, the list of successive signal sources. Our stochastic 
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matrix, R, is associated with a single unit and focuses on its input. Take R to provide the 
following information: it gives the probability that a signal comes from source x, given that 
the preceding signal was from source y. X is the list of sources, 1, . . . , N. 

Suppose the firing sequence 1-2-3 is common. According to [43], this is eventually rec- 
ognized by the unit and becomes a generalized "word." This not only categorizes data but 
can help in the interpretation of ambiguous signals. The biochemical mechanisms by which 
this recording takes place are mostly unknown. 

We now show how such a pattern, reflected in the R associated with this unit, will 
create a coarse grain. For simplicity suppose that the only marked pattern that is present 
is 1 — > 2 — > 3 — > 1 . . . . This means that R has matrix elements i?i3 ~ R32 ~ R21 ~ 1 5 with 
the other matrix elements smaller and essentially random. 

Numerically we found that distances between 1, 2 and 3 were much smaller than those 
between the others. Under coarse graining protocols the coarse grains that gave minimal 
internal distances were those in which 1,2, and 3 were put together. This property persisted 
for the non-cyclic process 1 — > 2 — > 3. 

A surprising feature was the role of noise. With zero noise, R (for the cyclic pattern) is 
simply the permutation on 3 objects, and is the identity on all other states (not reducible, 
but never mind). The eigenvalues associated with the permutation are the cube roots of 
unity, all of magnitude one. The left eigenvector for the root 1 is no problem as it is unity on 
all 3 of the important sources, hence yields small distance. For the other two root-of-unity 
eigenvalues, the norm of the eigenvalue is also unity, but the (left) eigenvectors have large 
differences (they too have components that are roots of unity). So the distance between 
these states would not be small (which we want it to be). 

With noise this situation clears up in an interesting way. A bit of noise affecting all 
sources first makes R irreducible, but also brings in all but one of the eigenvalues from the 
unit circle. The eigenvalues most affected by the noise are non-unit roots of unity, which 
have associated left-eigenvectors for which the participants in the cycle have quite different 
values. For all other left eigenvectors, the values taken on the cycle-participating sources 
are nearly the same. 

3. The karate club 

This is an example studied as a test of the formation of community structure in a small 
network. Many years ago, Zachary [44] studied the breakup of a particular karate club into 
two factions. He also managed to define a level of relationship between individuals, and 
since then generations of those studying community structure in networks have been using 
these relationships to predict the details of the split, with more or less success. 

Starting from the adjacency matrix B xy , describing which individuals are closest, the 
stochastic matrix can be defined as R xy = B xy /n y (no summation) where n y = ^2 x B xy . 
This immediately leads to the spectral properties of R and to the derived metric. A coarse 
graining into two grains using this algorithm either gives the correct split, or misses by a 
single individual, the same individual that in other methods also appears to behave in a 
maverick fashion. (In our coarse graining more than one outcome is possible because of the 
Monte Carlo method and also because different temporal parameters were used as well as 
slight variations in the metric itself.) 
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B. Reflections 

The effectiveness of the coarse graining scheme shows that in a sense time reconstructs 
space. And not only space, but order parameters as well (I omitted another example we 
studied, heat transmission, in which energy density emerged.) Others have found similar 
results. For example, Halliwell [45, 46], working at the quantum level, shows that the 
quantities most effective at decohering, hence having the best chance to make it to the 
classical level, are those that are slowest to change, for example, local energy density. 

It's amusing that some people have eliminated time from dynamics [47]. We go in the 
other direction, showing the pre-eminence of time in the creation of macroscopic perceptions. 

X. QUANTUM MEASUREMENT THEORY 

In this section I will mostly talk about my own work in quantum measurement theory. 
However, irrespective of the merits of my proposal, there are several points of general interest 
that anyone who worries about this question should come away with. First, there is the 
realization that quantum measurement theory is a problem in statistical mechanics. This, 
perhaps surprising, assertion reflects the view that statistical mechanics studies the interface 
between the macroscopic and microscopic. You derive equations of state by from particle 
dynamics — for lots of particles. The "problem" of quantum measurement theory is exactly 
the reconciling of micro and macro, the fact that experiments appear to have a definite 
outcome, while pure unitary time evolution gives rise to superpositions of living and dead 
cats. Except for this apparent contradiction, there is no measurement problem, and, given 
the success of other quantum predictions it would never enter anyone's mind to postulate 
funny mechanisms for "collapse of the wave function" or other nonsense (if I may allow 
my opinions to show). A second general message from the forthcoming presentation is that 
"measurement" should not be considered black magic. The separation of the world into 
system and apparatus, with apparatus treated more or less classically, could be maintained 
in 1930, when Bohr and Einstein famously debated these issues. But today, with fully 
quantum treatments of phenomena reaching the mesoscopic level, there is no excuse for 
not treating the apparatus as another quantum system, albeit a big one. Moreover, its 
"bigness" is reflected not only in its having a large total mass (as some simple-minded 
models of apparatus have done), but also in its having many degrees of freedom. A bona 
fide measurement amplifies and registers. That is, it promotes a microscopic event to one 
that we sense at a meso- or macroscopic level and it does it in such a way that something 
irreversible happens. These processes again require many degrees of freedom, so that we 
have been brought back to statistical mechanics, fleshing out my earlier contention. As 
part of the presentation of my work, and as a way to check it, I will give an example 
below of a fully quantum apparatus. If you happen to have your own ideas on quantum 
measurement, you should also be able to check them in this way. The apparatus model is 
one I developed with Gaveau, since at that point (about 1990) I found nothing suitable, 
or even vaguely approaching the realistic, on which to test my ideas. Other models have 
since been proposed, and besides those in my book [1] I mention that of Allahverdyan et al. 
[48, 49]. 

For the purposes of the ideas that I'm about to present, the most important message 
from the previous lectures is that you should abandon your initial conditions prejudice. The 
"natural" use macroscopic of initial conditions is equivalent to the thermodynamic arrow 
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of time. If that arrow arises from something else, for example cosmology, then just as 
one contemplates various cosmologies that may govern a universe, so one can contemplate 
different arrows of time. 

As remarked, the puzzle of quantum measurement theory is the appearance of definite 
macroscopic states as the result of measurements. The explanation I am about to offer 
takes quantum mechanics as inviolate and the wave function as physical. The revolution- 
ary steps are in statistical mechanics, challenging the foundations of that theory in areas 
where it does not have experimental support. And the main challenge will be that when 
you see a macroscopic state, you do not assume that all microstates consistent with the 
macrostate are equally likely. In particular, you may find cryptic constraints, as we did for 
the two-ended low-entropy ideal gas, evolving under the cat map. Thus statistical mechanics 
enters profoundly and in a way that goes beyond the treatment of a many-degree-of-freedom 
apparatus. That's the end of the story; now let me start from the beginning. 



A. Precis of the idea: the schizoid cat 

Here's an example of the quantum measurement problem: you have a two-state system 
described by a wave function <fi. At the beginning of the experiment, = ai<p\ + a 2 <p 2 . {<Pk 
is assumed normalized and Yl \ a k\ 2 = 1-) The wave function of the apparatus measuring 
this system is ft. Initially the total wave function is 

% = (Mi + a 2 2 )ft. (123) 

The total Hamiltonian is H, and after a measurement taking time t the wave function 
becomes 

= exp{-iHt/h) [(«i0i + a 2 <p 2 ) ft] = <*i</>ifti + a 2 <f) 2 tt 2 , (124) 

where ft*, are macroscopically distinct states, meaning that no practical apparatus (having 
operators Q) can get a nonzero value for (fti|Q|ft2) [50]. This is a "Schrodinger cat," and 
is never observed, although as I indicated it cannot be observed. 

At this point there are numerous strategies to justify the statement that one of these 
two states is randomly selected with probability |o;fc| 2 and the other . . . disappears, or is 
irrelevant, or is part of another "world", or ... . 

My contention is that among the myriad of macroscopically indistinguishable microstates 
(ft's) there are those for which — under pure unitary time evolution — the final state is only 
one of the two possible macrostates. For example, I postulate that there is a state ft' for 
which 

= exp(-iHt/h) (ai^i + a 2 (j) 2 ) ft' = 0^ (125) 

(in fact there are many of them). Correspondingly there are other initial apparatus states 
ft" for which ______ 

*/ = exp(-iHt/h) [(ai0i + a 2 <p 2 ) ft"] = 2 ft'_ . (126) 

Thus when you do an experiment and get outcome #1, it means that the apparatus, prior 
to the experiment, was in a special state of type ft'. Similarly for #2. The superposi- 
tion principle still works, and if your initial apparatus state were ft' + ft", you would get a 
Schrodinger-cat state. It's just that this superposition does not appear (as an initial condi- 
tion) in Nature. A little thought about the implications of this idea shows there is (at this 
stage) no role for probability and that there must be a single wave function for the universe. 
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The assertions of the last paragraph about the existence of special states are statements 
about quantum mechanics, solutions of Schrodinger's equation for big systems. They are 
difficult assertions, especially since when I first began thinking about these things I found no 
reasonable quantum models of apparatus on which to test them. But I'll show you examples 
where you can find such states. Let's move on to the really difficult part. 

States that lead to only one macroscopic outcome, as in Eq. (125) or Eq. (126), I call 
"special" states. These equations tell you that if you had as your initial conditions in some 
experiment a special state (I'll dispense with the quotation marks) then you have no cat 
problem. Good, but why should you have this sort of initial condition? 

Our experience with two-time boundary value problems suggests that a final condition 
can select, in a precise — meaning microscopic — way, particular initial conditions. Thus in 
our doubly-conditioned cat map we could get a Poincare recurrence in 22 time steps, where 
ordinary probability arguments would suggest you'd have to wait G N time steps, with G 
the number of coarse grains and N the number of particles (which in our examples came 
to « 10 1000 ). In thinking about the early universe it is reasonable to suppose it to have 
been in a pure state in which distant objects were not entangled. I will now assume that we 
live in a roughly time-symmetric universe, starting at a big bang, ending in a big crunch. 
If we assume that the arrow of time is a consequence of the geometry then the for the 
late universe (in our time reckoning) one should have a similar state — and in particular one 
in which distant objects are not entangled. This will be problematic if unitary evolution is 
continually generating Schrodinger cats. One way to have distant objects unentangled "late" 
in the life of the universe would be to maintain purity and localization throughout. The 
alternative would be to have macroscopically different states (the wave functions that went 
off to become other "worlds" in the many- worlds sense) recombine coherently. Having distant 
objects unentangled as the big crunch approaches is unlikely, but the least unlikely way to 
accomplish this is to avoid macroscopic entanglement at every stage. Thus maintaining 
"purity" (of the wave function) throughout means that at each juncture where a "cat" 
could form, it doesn't, namely the state of the apparatus and system is special, in the sense 
of Eqs. (125-126). 

I point out that the use of special states to solve the measurement problem, and the 
explanation just offered for the actual occurrence of special states in Nature, are conceptually 
independent. If special states really are the answer, it would not be the first time someone 
has found a solution without understanding why it works. When Boltzmann (I'havdil) 
postulated an early low-entropy state to solve the arrow of time puzzle he attributed it to a 
fluctuation. He could have no idea of the developments in cosmology that would provide a 
much better explanation for the universe to arrive at an extremely unlikely state (see Sees. 
VII and VII 1). In other words, there could be an entirely different reason for the cryptic 
constraint that selects special states, but I don't have the imagination to think of it. 

It also must be emphasized that the idea presented here is a physical one, not an al- 
ternative "interpretation" (and in particular it is not a hidden variable theory). There are 
particular microscopic states that occur in Nature and there are those that don't. To show 
that this idea describes, or may describe, the physical world several tasks must be addressed: 

• Do there exist special states? Sec. XB. As indicated, one can examine particular mod- 
els of apparatus to ascertain whether they have a sufficiently rich class of microstates 
to give the single-outcome time evolution demanded. 

• How do you recover the Born probabilities? Sec. XC. Probability will enter as it 
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FIG. 29: Three stages in the operation of a cloud chamber. After expansion (left figure) the vapor 
is supercooled, but in the absence of disturbance would not form liquid for a long time. The middle 
figure shows a particle traversing the chamber, ionizing two atoms in its wake. On the right, the 
disturbing particle has long passed and droplets have formed at the sites of the ionization. One 
deduces the path of the particle (dashed line) from the droplets. 

does in classical mechanics. When there are many special states, you cannot know 
which microstate the apparatus actually is in (same as when you flip a coin). The 
probabilities are then the measures (relative dimensions of the Hilbert subspaces) of 
the states associated with each outcome. 

• Can experiment determine if this theory is correct? Sec. XD. Something may be 
physically true even if you can't, today, do an experiment to prove it (string theorists 
will be sympathetic to this assertion). Nevertheless, I will suggest tests that can be 
made of this theory. 

B. Existence of special states 

A schizoid cloud chamber 

I will exhibit special states in a model of a cloud chamber. First I'll review a bit about 
cloud chambers, then offer a fully quantum model and show how a measurement would 
create a superposition of macroscopically distinct states. Then I'll identify the special states 
within the model. 

A cloud chamber (see Fig. 29) detects the path of a particle, say a cosmic ray, by forming 
small droplets of liquid within a vapor (the "cloud") along the path [51]. Like all appa- 
ratus, it functions as an amplifier to promote the microscopic to the macroscopic. This is 
accomplished by putting the vapor into a metastable state. Within the chamber, besides 
the material of the vapor, there is a non-condensable gas. By expanding the chamber, the 
temperature is rapidly lowered. At the new pressure and temperature the vapor should be 
liquid, but this does not happen instantly because the system first must form a sufficiently 
large droplet (larger than a "critical droplet" [52]). The passage of the cosmic ray helps 
create such droplets. After that they grow rapidly, although in practice this growth only 
takes place long enough for the whole thing to be photographed. Then the chamber is 
compressed, warming it and reestablishing the vapor. 

In any small volume of supercooled gas through which the charged particle passes, there 
may or may not be an ionization event due to that passage. The scattering and (possible) 
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ionization are describable by unitary time evolution within a quantum mechanical frame- 
work. If one were to calculate the results of the scattering there would be two significant 
components to the wave function, one in which the atom was ionized and one in which it 
was not. These are the Schrodinger cat states. The relative norms of these components give 
the probabilities of droplet formation and non-formation within that volume. (This way of 
generating a superposition of macroscopically different states is not quite the same as pre- 
sented in Eq. (124), where the initial wave function already has a superposition structure. 
The principle however is the same. Below, in Sec. X C, I'll discuss states having the form of 
Eq. (123).) 

I will next make a model of this system, not terribly realistic, but nevertheless possessing 
the essential properties of the cloud chamber: metastability, amplification and irreversibility. 
So it will behave like an apparatus. Then I will display its special states. 
Remark: In the next several pages I present a number of detailed calculations, and it will 
be easy to get bogged down in details and lose sight of the point of all this arithmetic. Let 
me here outline the steps. 

• Defining the Hamiltonian 

• Qualitative description of the operation of the detector 

• Calculations supporting the qualitative description 

— Spontaneous spin flip: false positives and ground state stability 

— Scattering-induced spin flips 

• Resume of the detector operation 

• Special states. This is the point of the story! After slogging through the detailed 
operation of the detector, you will see that indeed, although it usually (i.e., for most 
initial conditions) gives a "cat" state, for some particular initial conditions it gives 
only a single outcome, one or the other definite state of the "cat." 

Defining the Hamiltonian 

For the metastable system I use a three-dimensional array of quantum spins. They 
interact with ferromagnetic short range forces. In the presence of a magnetic field they align 
along that field. On the left in Fig. 30 I show a two-dimensional slice of an array in which a 
magnetic field points up. For convenience I assume the temperature is so low that even for a 
fairly large array no spin is reversed [53] . To produce a metastable state one makes a sudden 
reversal of the magnetic field. The upward pointing spins are no longer the lowest state of 
the system, but their transition to the stable state is slow and the system is metastable. If 
the field is strong the "critical droplet" size can be 1, so that with a single spin flip it will 
be energetically favorable for the region of overturned spins to grow. 

Here is how the array functions as a detector. The particle to be detected, called X, 
interacts with the spins in such a way that it tends to flip them. When X traverses the array 
it can cause a spin to reverse. See the right hand image of Fig. 30. The neighbors of that 
spin are now in a more favorable situation for their own flips. With no spins reversed, six 
neighbors oppose a flip (on a cubic lattice). But after a reversal, a neighboring spin has only 
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FIG. 30: Two-dimensional slice from an array of spins. On the left, before the apparatus is 
primed, the upward pointing field (indicated by the large arrow) orients the spins upward and the 
temperature is low enough so that opposite pointing spins are unlikely. On the right, the magnetic 
field has been reversed, so that all-spins-up is metastable, not stable. A particle has passed through 
the array, flipping one spin and being scattered slightly in the process. Subsequently the neighbors 
of the flipped spin will turn over because the reversed bond allows them to align with the field. 



five opposed and now one neighbor favors the flip. Subsequent flips are yet more enhanced 
and the "droplet" grows. 

As the Hamiltonian for the interactions affecting only the spins take 



{W) t 



(127) 



where £ and £' range over the spins and (££') means summation over nearest neighbor pairs. 
(Take h — 1.) For ferromagnetism, the spin-spin coupling constant is J is positive. The 
external field is h. The operator ae z is the tensor product of the quantum spin operator a z 
for the £ th spin and the identity operator for the others. 

Besides the spin degrees of freedom there is the underlying lattice whose phonons interact 
with the spins. Take the combined lattice and lattice-spin Hamiltonian to be 



N N 

H p = $>4& fe + E E (^% e+ b k + lle-^a^bl) 

k=l I k=l 



(128) 



The operators b k and b\ are the phonon annihilation and creation operators for the k th mode 
(out of N) of the lattice. The mode's frequency is uj^. The phonons couple to the spins 
with coupling constants 7 fc , together with a phase, e l<t>k ^\ that varies with the position of 
the spin on the lattice. The operator o> + is the raising operator for the £ th spin and it and 
Gg_ are defined analogously to ai z . (These interactions are completely standard and in a 
photon context give the Jaynes-Cummings Hamiltonian.) 

Finally we present the portion of the Hamiltonian for the particle (X) to be detected. We 
assume that the effect of X is to induce a strong coupling between the spin and the phonons. 
Its Hamiltonian is taken to be 



2 N 

H d = ^ + 5>(:r - x e ) (c k e^a e+ b k + cJe^W^fct) 
i k=i 



(129) 
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Again, the Ck are coupling constants and the function v(u) is non-zero only when |u| is near 
zero. We take v dimensionless and of order unity. The Ck are much larger than the usual 
lattice-spin coupling, i.e., \ck\ \ jk\- 

The total Hamiltonian for the system is the sum of the three pieces we have displayed, 
-^total = H s + H p + H d . This model does not have an exact solution. But the point is not 
to provide exact solutions, only to arrive at the physical behavior, to show that notwith- 
standing an entirely quantum treatment, we have a system that behaves like a measuring 
apparatus. This is established in the following way. We first describe the response of the 
system qualitatively and then indicate what specific calculations are needed to justify the 
statements. Some of those calculations will be done here, some are in [1], some in [54]. 
Having done this, we will return to our primary goal, the exhibiting of special states. 



Since v(-) is short ranged, the impinging particle effectively interacts with only one spin. 
With this spin there is significant interaction for only a short time. As a result, once the 
particle is out of range of the spin, there is amplitude for having flipped the spin, amplitude 
for not having flipped. The wave function will have two components, not yet macroscopically 
distinct. In either case X moves away from the array and the next stage takes place by means 
of the coupling due to the 7^. For the component that has not flipped, the system is more 
or less where it was before, and it is unlikely to flip. For the wave function component with 
the (single) flipped spin, the 7fc-coupling can either unflip the spin, or can cause neighboring 
spins to flip. Because the energy balance has shifted, the most likely thing to happen is the 
flipping of the neighbors of the spin that was hit by X. Once this occurs the tendency becomes 
overwhelming. Since unflipping was unlikely, even after a single spin flip, the probability of 
detection (by spin #0) will be the square of the amplitude for flipping after the single-spin 
encounter. 

The most important calculations required for analysis of the model are 1) flip probability 
for a single scattering event; 2) false positives: probability of a flip in the absence of any 
external particle (i.e., via 7 fc alone). 



False positives result from a decay process and are easy to calculate. Such spontaneous 
transitions can be computed using the formalism developed for decay in Sec. II. Because 
our Hamiltonian H p only allows creation or annihilation of a single phonon, one can identify 
the matrix elements that correspond to the Hamiltonian given in Eq. (55). That form was 



where the first component ("x," to be called component number "0") is the initial state 
and the other components ("Y") are the decay products. We take as the initial state 
|+, rii, ri2, . • .); here the integers rik are the eigenvalues of b\b k and the "+" represents the 
state of the particular spin (from the array) on which we now focus. The diagonal matrix 
element of the Hamiltonian for this initial state has the value H 00 = —6J + h + Ylik^^k- 
This state couples to all states of the form | — ,n 1: n 2 , ■ ■ ■ , {n k > + 1), . . .), k' — 1, 2, These 



Qualitative description of the operation of the detector 



1. Spontaneous spin flip: false positives and ground state stability 





(130) 
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states have diagonal matrix elements Hqq + 12 J — 2h + u k i. The coupling due to the terms 
'j k e l ^ >k ^ <Ji + bk and their adjoints, give rise to ^ k e l ^ k ^ \fn k + 1 in the first row of the Hamilto- 
nian (to the right of the "00" diagonal term), and to its adjoint in the first column. In Sec. 
II we calculated the transition probability for the decay, which in this context is spin flip. 
The rate is given by the "Golden Rule," as found in Eq. (67). To use that formula we note 
that the energy deposited in the phonon mode matches initial and final energies, namely, 
AE = 2h — 12 J, which we here take to be slightly positive. Let k be the phonon index for 
which oj k = AE, then in the continuum limit the "Golden Rule" gives 

T = 2ix{n k )p{k)\ Tk \ 2 (131) 

The factor (n k ) is the expected number of phonons in the initial state, and for inverse 
temperature (5 = 1/ksT is given by [exp(flAE) — l] -1 . For practical situations, (n k ) is of 
order unity. Under the assumed physical conditions, the important factor in Eq. (131 ) is 
p(k), the density of states. For low energy phonons this is proportional to the square of 
the frequency; thus T ~ AE 2 <C 1. It follows that if the detector contains M spins it will 
be unlikely to have any spontaneous flips provided M AE 2 \^ k \ 2 <C 1. This means that our 
detector is not "macroscopic" in the sense of operating at any possible size, but is limited 
to dimensions that could be called mesoscopic. Once the signal is registered on our detector 
it can be recorded by a second level detector, just as the droplets in the cloud chamber are 
photographed before the next decompression/compression cycle. 



2. Scattering-induced spin flips 



For convenience we assume that X traverses the detector rapidly, on a time scale such 
that little takes place due to the usual spin-phonon interaction (mediated by "7^" ) or among 
the phonons. Also we take the kinetic energy of X to be large compared to the detector 
interaction; thus the bend in the trajectory in Fig. 30 is exaggerated. Finally, the range of 
the potential v is such that only a single spin is involved in the initial stage of detection. 
Let that spin be labeled "0" and suppose (without loss of generality) that <f>k(0) = (0 fc is 
the phase in Eq. (128)). Under these circumstances we can confine attention to this spin, 
for which the result of the interaction will be 



ipo -> exp 



r N 

-i / dtv(x(t) - x ) ( 
J fc=i 



c k a 0+ b k + c* k a Q _b\ 



Wo 



(132) 



where ipo is the initial spin wave function of spin #0, which is (J) ( i.e., "up"). The function 
x(t) is the path of X, which we assume unaffected by the interaction. We can now write 
T = J dtv(x(t) — xq), an effective contact time (since v is dimensionless) . We are therefore 
interested in the unitary matrix 



U 



exp 



N 

-iTy~] (c k a 0+ b k + c* k cr _bl^ 



k=l 



(133) 



We henceforth drop the subscript "0" referring to the spin number. To obtain an explicit 
form for U, define a new set of boson operators. Let 



^ = Y.j bk where 5 =y£ 



(134) 
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(The "1" is the phonon-state label.) The other boson operators, $2, • • • ,Pn, are built from 
the bk (k — 1, . . . , N) that are independent of In terms of the /3's 

U = exp (-icT(/3i<7+ + /3}<t_)) (135) 
It is an exercise in spin matrices to evaluate U. With the definition 

Pi = (an operator) (136) 

we find [55] 

U= exp(-i&T(/3 1 <T + + Pl<r-)) 

= cr + a_ cos {kT^/Vi + lj + o"_o" + cos ^cT\/pQ 

{sin (cTx/vi + l) sin (cTa/PT) + I 

^ JLL. + V ^ flV- (137) 

\M + 1 v ^i J 

Using [/ we immediately get the wave function at time t. Because of the spin index and 
the two different phonon bases that we use, there is a certain notational overload. From the 
form of U it is obvious that the eigenvalues of v k = (3 k /3k, for k ^ 1, are of no importance for 
the action of U. Therefore in looking at the evolution of the spin-0 wave function and the 
phonon wave functions we specify only the spin state (of spin 0) and the number of phonons 
of type 1, i.e., the eigenvalue of v\. Let this eigenvalue be designated v\. Thus if the initial 
spin-0-cum-phonon wave function is ^(0) = |+, u\), by Eq. (137) it evolves to 

/ x / \ , , si 11 (cTy/upj . , 

i){t) = cos (cT^Vr + lj |+, v±) - i — >-^v x + 1 |- v x + 1) 

= cos (cTV^i + 1) l+jfi) -isin (cTy/^ + l) |-,z/i + l) (138) 



This is the state of spin after X has passed [56]. If the remaining processes are nearly certain 
(when spin is down, the whole thing turns over, otherwise it does not) then Eq. (138) gives 
the detection and non-detection amplitudes. For a thermal distribution of phonons in the 
initial state, the probability of detection is 

Pr (detection) = ^ ( ^ sin2 {cT^/W+l) \ , (139) 

\ vi I cT 

where Z = 1/(1 — exp(— /3Q)), Q = \ck\ 2 ujk/c 2 , and the subscript cT on the angular, 
averaging brackets indicates an average over particle-spin contact times (T is built from the 
trajectory of X) [57]. 



3. Operation of the detector 

This is everything we need to know about the detector. After X traverses the array, the 
wave function is of the form given in Eq. (138). The neighbors of spin now flip due to 
their interaction with the phonons and driven by the external field. This flip rate is much 
greater than that for spontaneous flips. In three dimensions, flipping M spins takes on the 
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order of M 1 / 3 flip times (at the fast, alignment-producing flip rate). If in this interval the 
probability of a spontaneous flip is small, then false positives are unimportant. 

With M spins overturned, even for modest M (perhaps 1000 or fewer), we have entered 
the classical domain in two senses. 1) The overturning of the spins is irreversible. Poincare 
recurrence times are enormous. 2) The disturbance is large enough so that subsequent 
evolution (e.g., a probe of the detector's magnetic moment) can be treated classically. 

We next look at the wave function of the detector. After the first spin has completed 
its contact with X, the detector wave function is given by Eq. (138) (or a superposition of 
such terms) crossed into the wave function of all the other spins and phonons. Continuing, 
the wave function of the entire system after a further time interval, t, sufficiently long for a 
large number of spins to fall, is given by the action of exp(-iHt) on the full detector state 
just described. Thus 



-iHt 



{cos (cTV^i + l) |+, v\) — ism (cTy/vi + l) | — , v x + 1)} 
spins other than 0, all of which are up 



phonons other than all of which are unchanged 



(140) 



As discussed above, the action of e %m leads to 



# = cos (gTv^i + r 



all spins up; phonons hardly changed 



— i sin (cTy/vi + l) spins down; phonons somewhat excitedy 
This is a Schrodinger cat. 



(141) 



4- Scattering-induced spin flips with SPECIAL initial conditions 

Our explicit microscopic construction of a quantum measurement apparatus has had one 
purpose: to exhibit special states for that apparatus. After the passage of the particle X, 
the wave function of spin #0 and the phonons was 

V>(*) = cos {cTsJv x + 1) - % sin (cTy/v l + I) |-,^i + 1) (142) 

where the only important fact about the phonon state is its quantum number with respect 
to a particular mode, "1"; the other modes can be doing anything, and are unaffected by X. 
(With the departure of X the energy is spread, since the mode "1" is not an eigenstate of 
the Hamiltonian.) Furthermore, we consider a situation where once this one spin is flipped 
the continuation of the process is inevitable, and had it not flipped, nothing would have 
happened. 

Now T is a number that will vary widely, depending on the proximity of X to the spin; 
there may also be variation in the coupling constants, In addition, from run to run there 
can be a wide range of values of vi, representing thermal fluctuations in the lattice near 
the spin site 0. Suppose it should happen that 6 = cTy/i/i + 1 ss ir/2. Then (by Eq. (64)) 
the spin-up component of ip(t) has coefficient cos 9, which is close to zero. The spin-down 
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component, by contrast, will have magnitude near 1. With these values of T and v\ the final 
apparatus state is definite. On the other hand, for another scattering event it may happen 
that 6 ~ n, so that despite the rather close coupling of scatterer and detector, there will be 
hardly any effect on the spin array One spin does a handstand, but recovers completely. 
Again the final apparatus state is definite. As can immediately be seen, these are but two 
possibilities for special states, states for which the experiment gives a definite result and the 
system does not find itself in a Schrodinger cat state at the end of the experiment. 

It follows that if it should happen that for a given passage of an X particle the initial state 
(of X and of the phonons) is an eigenstate of v\ with an eigenvalue such that 9 (= cT\/v\ + 1) 
is close to [integer] x tt/2, then the final state will not be a superposition of two different 
macroscopic states, but only a single such state. This identifies a class of special states for 
the cloud chamber model apparatus. 

5. Other apparatus models 
In [1], several other apparatus models are studied and their special states shown to exist. 

C. Recovering the usual probabilities 

How does probability enter classical physics? If you flip a coin, the probability of "heads" 
or "tails" is proportional to the region of phase space (of the complete system: coin, arm, 
air, anything else having influence) associated with each outcome. It will be the same in 
quantum mechanics. Since I have only deterministic (unitary) time evolution, this is the 
only possibility. 

Given the vastness of phase space or Hilbert space (for macroscopic systems), once you 
have any special states, you should have many. Moreover, just as they are macroscopically 
indistinguishable from non-special states, so they are indistinguishable from each other. Two 
remarks before I state the probability rule: (1) By the correspondence principle, "volume 
of classical phase space region" becomes "dimension of Hilbert subspace;" (2) if two special 
states have the same macroscopic outcome, so does any linear combination of them, so each 
outcome can be associated with a Hilbert subspace. 

This then is the rule for quantum probabilities: the probability of a particular outcome is 
proportional to the dimension of the subspace of Hilbert space associated with that outcome. 

This is a far-reaching postulate. Given the difficulty of exhibiting even one class of 
special states, it would seem impossible to classify all special apparatus states and on top 
of that guarantee that their abundance exactly matches the absolute value squared of the 
amplitudes of the system being measured. 

Let me mention my experience in confronting this issue. At first I didn't. Then at some 
point I was explaining these ideas, and persistent questioning [58] made me realize that 
without cleaning this up I was indulging in daydreams. I will explain my general reasoning 
below, but I should say that after a week or so of thinking about this I had reached an 
impasse, a contradiction, and felt that my all my work on the quantum measurement problem 
had indeed collapsed into daydreams. But there was a way, and when I found it, it had a 
certain cleanness to it (perhaps I should leave that to others to judge). 

The goal then is to take a general approach, one that should not depend too closely on 
the apparatus. For specificity though, imagine a Stern-Gerlach measurement of a spin-1/2 
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particle. Suppose it is in a superposition of the following sort: 




(. 







1 



(143) 



This is the state in which it is prepared. The record of its arrival at a screen downstream 
from the magnets will show an event in one of two regions, one for spin-up, the other 
for spin-down. There could be unusual microstates anywhere along the way (that is easy 
to understand if the rationale for special states is future conditioning). The screen could 
strangely not function correctly, the magnetic field could experience a fluctuation, or the 
single spin-bearing particle could be subject to stray fields or other influences at any time 
after its preparation. As usual when looking for something unusual (the special state in this 
case) one should look for the least unusual. The measure may be small, but it would be 
bigger than the alternatives. It seems to me that acting unusually on a single spin is easier, 
i.e., less unlikely, than acting on all the electrons that give rise to the field in the big magnet, 
or persuading the many atoms involved in detection in the screen to fire or not to fire. 

Before giving the detailed argument I remind you of the expected result. After pass- 
ing through the magnets, the amplitude for arriving at the up detector is proportional to 
cos(6 l /2), and for arriving at the down detectors is proportional to sin(0/2). The ratio of 
down to up detection is therefore tan 2 (9/2) . 

Since we are looking for a general result, we take the perspective of the "system," the 
particle bearing the spin. Whatever the features of the special state it experiences, they 
will look like perturbations, disturbances, as it makes its way from preparation onward. If 
those disturbances could rotate it from the angle 9 in Eq. (143), to the angle or tt, before 
it reached the magnet, the special state would have done its job. So without going into the 
details of the cause of these disturbances we imagine that there is some likelihood of one or 
another disturbance. Let us call such a disturbance a "kick" and suppose that there is some 
kick distribution function, f(ip), with / the relative probability for a kick that will change 
the angle 9 in Eq. (143) to 9 — ip [59]. (It is worth elaborating on the relation of this to 
the exact "apparatus" [60] microstates that I earlier called Q. The idea is that there are 
many f2's, some of which do not kick the particle spin (so for them ifj = 0), some of which 
give a big kick, etc. The probability that I associate with a given ip is the relative number 
of Hilbert space dimensions in the overall space of such Q's that move 9 by the amount ip. 
Thus Vt provides a fully quantum description of the environment, de facto to be considered 
part of what we call apparatus.) 

I have now gone far enough into the story to explain why I thought my theory was washed 
up. Clearly, f(ip) should be largest at ip — 0, and drop off for larger \ip\. So one would expect 
to use a succession of small kicks to get from uq to u or u n . By the central limit theorem,this 
would require that / be a Gaussian. But you cannot get the ratio tan 2 (#/2) from a Gaussian. 
At some point it dawned on me that maybe the assumptions of the central limit theorem 
should be dropped, mainly the existence of a second moment for the distribution. So I'll 
turn the question around and ask, what function / can give me the ratio I need, and then 
worry about its moments. 

To get to the up state you need a kick by 9 or by 9 + 2n or 9 + 47T, etc. Similarly for the 
down state you need a kick by n + 9, or that, plus multiples of 2n. It is useful to define 




(144) 
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The demand that the ratio of probabilities recover the standard value is the functional 
equation 



2 6 F(0 + n) 



tm 2 = F(¥) ■ (U5) 

To guess a solution of this equation consider 9 near zero. Presumably in this case F should be 
dominated by the k = term and the numerator in Eq. (145) should not be doing anything 
interesting. Therefore one expects 9 2 ~ 1/ f{6). It turns out this is an exact solution! The 
following is an identity: 



tan 2 



The physics therefore requires that / = const /8 2 , down to 8 as small as has been experi- 
mentally checked. For finiteness however this cannot hold all the way to zero. To find a 
good form for the cutoff, I review a bit of mathematics. 

As a probability distribution (and irrespective of its small behavior) / does not have a 
second moment, and is just borderline for a first moment. The study of distributions that do 
not satisfy the central limit theorem was pioneered by Paul Levy. Just as the Gaussian is a 
kind of basin of attraction (under summation) for random variables having second moments, 
so the Levy distributions are attractors for those lacking such moments. The canonical form 
for distributions with / ~ 1/x 2 is the Cauchy distribution (a particular Levy distribution) 

C a (x) = -^- 2 . (147) 

We therefore take / to be the Cauchy distribution with small parameter a. 

These distributions have lovely and surprising properties. For our purposes I emphasize 
the tendency to have large fluctuations. For example, it is about 5 times less likely to get 
the result 10a than the result a from the Cauchy distribution. For a Gaussian with standard 
deviation a it is about 3 x 10 21 times less likely. For other comparisons see [1]. 

I will return to the parameter a when discussing experimental tests (Sec. XD). 

The story that I have just told for 2-state systems goes through for any finite number 
of components. The "kicking" takes place in a larger Hilbert space, but the mathematics, 
including the role of the Cauchy distribution, goes through remarkably smoothly. 



\ip\ 2 as probability 

A problem of great interest in quantum mechanics is whether it is possible to derive the 
association of 1^1 2 and probability. For example there was heated discussion of this issue at 
a recent Seminare Poincare [61]. 

The derivation just given, in which the Cauchy distribution provides just the right asymp- 
totics to recover the |^| 2 relation, does not go through for any other power of \ip\. This is 
not a proof that in the Copenhagen approach you must get \ip\ 2 . Rather it's the statement 
that if you could derive the kicks from an underlying fluctuation or large deviation theory 
of matter, you would also be deriving the usual relation of probability and wave function. 
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D. Experimental tests 

There are two categories of test: (1) confounding Nature's efforts to provide a special 
state, and (2) noticing the Cauchy noise. 

It would seem easy to isolate an experiment so as to prevent outside influences from 
sending the microstate into one of the possible macroscopic outcomes (in which 
measurement of the system in that outcome state would disprove my theory). But in fact 
it is not easy. Even with very few atoms around, if they act coherently their impact can be 
great. In [1] I estimate the level of vacuum needed in a typical scattering experiment and 
find it to be extreme. EPR experiments seem more promising, since one can form a better 
idea of where the "specializing" must take place. In any case, I have not yet formulated 
a sharp test of this sort. I have also begun to worry lately that dark matter may have 
something to do with this, in which case isolation would be impossible. 

The Cauchy noise however might nevertheless reveal itself. This could happen by studying 
fluctuations in ordinary matter. After all, the functional form of the Cauchy distribution is 
well-known in physics: it's just the Lorentz line shape. So there are many physical variables 
distributed in this way. 

A second manifestation of the noise comes from the parameter "a" in the Cauchy distri- 
bution, 

a/n 
x 2 + d A 

To achieve perfect matching with what seem to be physical demands, one needed to satisfy 
Eq. (145). Taking f(9) = C 7 (#), such agreement would require 7 = 0. It's easy to calculate 
the probabilities if 7 7^ 0. We need the sum 



Ga(x) = • (148) 



F{6) = 



00 
-00 



7T ^ (# + 2n7r) 2 + 7 2 

n=— 00 

1 / 1 \ (^)tanh^7 

Im = y*!LL ?_! (149) 

2tt ytimUe-i^)) sin 2 \Q + cos 2 \6 tanh 2 | 7 



For an initial state ug, the probability of observing up is 



F(0) cos 2 § + sin 2 § tanh 2 J 

Pr w = mrwrv) = Hwg (150) 

For small 7 this gives 

9 1 2 

Pr(up) = cos 2 - - ^-cos# (151) 

This is not the standard result. An atom going in with spin up (6 = 0) has probability 7 2 /4 
of being misread as down. If such an effect were detected, one could further get some idea 
(from the size of 7) of the physical origins of this noise. 

Remark: There is already a source of small-angle error that is known from the "WAY" 
theorem [62-64]. Because the quantity measured is conserved by the total Hamiltonian, the 
work of Wigner, Araki and Yanase implies that there is necessarily a small error, on the order 
of the ratio of the angular momentum to be measured to that of the apparatus. Nevertheless, 
by selecting a suitable experimental context, the search for small angle deviations seems a 
promising avenue. 
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E. Excerpt from Ref. [1] (Time's Arrows and Quantum Measurement), Sec. 6.3, on 

the determinism implicit in this worldview. 

This excerpt emphasizes the fact that the quantum measurement theory described in these 
notes (and in the book) require a deterministic world. The entire history of the universe, 
past and future, is fixed. 

The other claim is the more dramatic. First you can't control your initial state, and 
second it is always one of the presumably rare 'special' ones. For those who have read the 
earlier chapters of this book, this claim should not be surprising. The thermodynamic arrow 
of time is what gives us our strong prejudice on the arbitrariness of initial conditions and 
the specificity of final conditions. In a dynamical system for which the boundary conditions 
are naturally defined at more than one time, the selection of allowable states (from among 
all candidates macroscopically possible) will not favor initial conditions in this way. 

Therefore we can already anticipate that the constraint on initial states will have some- 
thing to do with the giving of conditions in the future, conflicting with our primitive intuition 
that only statements about the past influence the present state of a system. 

There is another claim implicit in this proposal. Suppose your laboratory is wealthy 
enough to have two cloud chambers and before doing the experiment you have to decide 
between them, based perhaps on choosing the one with a gas leak or the one with an 
unreliable compressor. (Well, maybe the lab is not so wealthy after all!) The chamber that 
you opt to use will turn out to be the one with the right 'special' state. Or suppose you 
decide to aim the beam a little differently. Then it will be other gas molecules that are 
primed for perfect detection or perfect non-detection. It follows that the 'special' states 
that occur are coordinated with your decision. But since (according to this theory) nothing 
ever happens except pure, unitary quantum evolution, the precursors of these states were 
heading where they were going before you made your 'decision.' So your decision was not 
a decision, and your wave function and that of the detectors are correlated. Pursuing this 
line of reasoning to ever greater scales, it follows that my ideas can only be valid if there is a 
single wave function for the entire universe. This wave function has the precise correlations 
necessary to guarantee 'special' states at every juncture where they are needed. 

Again, it is my hope that the edge on the foregoing assertion has been taken off by the 
earlier parts of this book in which I discussed the arrow of time and past and future boundary 
conditions. A future boundary condition can trivially generate long range correlations. For 
chaotic systems these correlations demand extreme precision. What the foregoing discussion 
implies is that if I propose to motivate the appearance of 'special' states by a future boundary 
condition, that boundary condition should involve the entire universe. So if I am right 
about this explanation of the quantum measurement problem, not only must we reexamine 
statistical mechanics, but cosmology plays a role as well. 

See also the notes to Sec. 6.3 of the book (pp. 220-221). 
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